Q-DeepSight: Incentivizing Thinking with Images for Image Quality Assessment and Refinement
arXiv cs.CV / 4/21/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces Q-DeepSight, a multimodal “think-with-image” framework for Image Quality Assessment (IQA) that provides actionable, localized feedback rather than only global scores.
- Q-DeepSight uses interleaved Multimodal Chain-of-Thought with tool-augmented evidence collection (such as crop-and-zoom) to identify where quality drops and the visual reasons behind it.
- To train long multimodal reasoning trajectories with reinforcement learning, the authors propose Perceptual Curriculum Reward (PCR) to reduce reward sparsity and Evidence Gradient Filtering (EGF) to improve credit assignment for visually grounded reasoning.
- Experiments show state-of-the-art results on benchmarks covering natural, restored, and AI-generated imagery, and the model is further applied in a training-free loop via Perceptual-in-Generation (PiG) to iteratively improve images based on its diagnoses.
Related Articles

Every time a new model comes out, the old one is obsolete of course
Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims
Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM
Reddit r/LocalLLaMA