GRAFITE: Generative Regression Analysis Framework for Issue Tracking and Evaluation
arXiv cs.CL / 3/20/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- GRAFITE is a continuous LLM evaluation platform that builds and maintains a repository of model issues based on user feedback to enable ongoing testing.
- It uses a QA-testing pipeline with LLM-as-a-judge and supports side-by-side comparisons of multiple models to detect regressions across releases.
- The framework provides an end-to-end workflow from issue collection to automated QA tests, enabling scalable, time-aware evaluation of model performance.
- The project is open-source at IBM/grafite and includes a demo video, offering a practical tool to assess LLMs and mitigate benchmark contamination.
Related Articles

I built an autonomous AI Courtroom using Llama 3.1 8B and CrewAI running 100% locally on my 5070 Ti. The agents debate each other through contextual collaboration.
Reddit r/LocalLLaMA
The Honest Guide to AI Writing Tools in 2026 (What Actually Works)
Dev.to
The Honest Guide to AI Writing Tools in 2026 (What Actually Works)
Dev.to
AI Cybersecurity
Dev.to
Next-Generation LLM Inference Technology: From Flash-MoE to Gemini Flash-Lite, and Local GPU Utilization
Dev.to