Banana100: Breaking NR-IQA Metrics by 100 Iterative Image Replications with Nano Banana Pro

arXiv cs.CV / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper identifies a key failure mode in multi-turn, multi-modal image editing: repeated edits cause iterative degradation that accumulates visible noise and can break adherence to even simple instructions.
It introduces Banana100, a new dataset of 28,000 images created via 100 iterative editing steps across varied textures and image content, specifically targeting this degradation phenomenon.
The authors report that common no-reference image quality assessment (NR-IQA) metrics fail to reliably flag heavily degraded images, with none of 21 popular metrics consistently scoring degraded images lower than clean ones.
The dual breakdown—both in image generators and in evaluators—raises concerns about training stability and the safety of deployed agentic systems if low-quality synthetic outputs bypass quality filters.
The authors release code and data to support building more robust editing systems and more reliable quality evaluation for multi-turn agentic workflows.

Abstract

The multi-step, iterative image editing capabilities of multi-modal agentic systems have transformed digital content creation. Although latest image editing models faithfully follow instructions and generate high-quality images in single-turn edits, we identify a critical weakness in multi-turn editing, which is the iterative degradation of image quality. As images are repeatedly edited, minor artifacts accumulate, rapidly leading to a severe accumulation of visible noise and a failure to follow simple editing instructions. To systematically study these failures, we introduce Banana100, a comprehensive dataset of 28,000 degraded images generated through 100 iterative editing steps, including diverse textures and image content. Alarmingly, image quality evaluators fail to detect the degradation. Among 21 popular no-reference image quality assessment (NR-IQA) metrics, none of them consistently assign lower scores to heavily degraded images than to clean ones. The dual failures of generators and evaluators may threaten the stability of future model training and the safety of deployed agentic systems, if the low-quality synthetic data generated by multi-turn edits escape quality filters. We release the full code and data to facilitate the development of more robust models, helping to mitigate the fragility of multi-modal agentic systems.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/7DailyView insight →

Black Hat Asia

AI Business

[R] The ECIH: Model Modeling Agentic Identity as an Emergent Relational State [R]

Reddit r/MachineLearning

Google DeepMind Unveils Project Genie: The Dawn of Infinite AI-Generated Game Worlds

Dev.to

Artificial Intelligence and Life in 2030: The One Hundred Year Study onArtificial Intelligence

Dev.to

Stop waiting for Java to rebuild! AI IDEs + Zero-Latency Hot Reload = Magic

Dev.to

Banana100: Breaking NR-IQA Metrics by 100 Iterative Image Replications with Nano Banana Pro

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

[R] The ECIH: Model Modeling Agentic Identity as an Emergent Relational State [R]

Google DeepMind Unveils Project Genie: The Dawn of Infinite AI-Generated Game Worlds

Artificial Intelligence and Life in 2030: The One Hundred Year Study onArtificial Intelligence

Stop waiting for Java to rebuild! AI IDEs + Zero-Latency Hot Reload = Magic

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer