MAR-MAER: Metric-Aware and Ambiguity-Adaptive Autoregressive Image Generation
arXiv cs.CV / 4/3/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces MAR-MAER, a hierarchical autoregressive text-to-image generation framework aimed at improving image quality and robustness to ambiguous prompts.
- It adds a metric-aware embedding regularization technique that aligns internal representations with human-preferred quality metrics such as CLIPScore and HPSv2.
- To better handle ambiguity in prompts, MAR-MAER incorporates a probabilistic latent model and a conditional variational module that inject controlled randomness during token generation.
- Experiments on COCO and a new Ambiguous-Prompt Benchmark show MAR-MAER improves over the Hi-MAR baseline by +1.6 in CLIPScore and +5.3 in HPSv2, while producing a wider variety of coherent outputs for unclear inputs.
- The authors report that gains are supported by both human evaluations and automated metrics, indicating improved semantic flexibility without sacrificing metric consistency.
Related Articles

Black Hat Asia
AI Business

Who is Xu Rui, the ex-ByteDance executive tapped by Meta to lead AI hardware?
SCMP Tech

I Built a Voice AI with Sub-500ms Latency. Here's the Echo Cancellation Problem Nobody Talks About
Dev.to

LLM Semantic Caching: The 95% Hit Rate Myth (and What Production Data Actually Shows)
Dev.to
Inside the Creative Artificial Intelligence (AI) Stack: Where Human Vision and Artificial Intelligence Meet to Design Future Fashion
MarkTechPost