Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding
arXiv cs.LG / 4/16/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that vision-language models struggle with compositional reasoning because contrastive pretraining lacks enough informative negative samples to distinguish subtle semantic differences like word order and attribute binding.
- It proposes that negative mining should be driven by lexical concreteness: replacing highly concrete terms creates stronger perceptual and structural mismatches, yielding a more effective learning signal.
- The method introduces ConcretePlant/Slipform to systematically manipulate perceptually grounded concepts for contrastive negative mining, including an analysis of InfoNCE showing severe gradient imbalance.
- To address optimization degradation from overly easy pairs dominating training, it formulates a margin-based “Cement loss” that dynamically calibrates penalties using psycholinguistic concreteness scores correlated with sample difficulty.
- Experiments report state-of-the-art results on compositional understanding benchmarks as well as improved cross-modal retrieval and linear probing tasks for both single- and multi-label settings.
Related Articles

As China’s biotech firms shift gears, can AI floor the accelerator?
SCMP Tech

Why AI Teams Are Standardizing on a Multi-Model Gateway
Dev.to

a claude code/codex plugin to run autoresearch on your repository
Dev.to

AI startup claims to automate app making but actually just uses humans
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to