Sabi\'a-4 Technical Report
arXiv cs.CL / 3/12/2026
📰 NewsModels & Research
Key Points
- Sabi'a-4 and Sabiazinho-4 are introduced as new generation Portuguese language models with a focus on Brazilian Portuguese.
- They were developed through a four-stage training pipeline consisting of continued pre-training on Portuguese and Brazilian legal corpora, a long-context extension to 128K tokens, supervised fine-tuning on instruction data spanning chat, code, legal tasks, and function calling, and preference alignment.
- They were evaluated across six benchmarks, including conversational capabilities in Brazilian Portuguese, knowledge of Brazilian legislation, long-context understanding, instruction following, standardized exams, and agentic capabilities including tool use and web navigation.
- They show a favorable cost-performance trade-off and improvements over previous generations in legal document drafting, multi-turn dialogue quality, and agentic task completion, positioning them well on the pricing-accuracy chart.
Related Articles

Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI
TechCrunch
[R] Weekly digest: arXiv AI security papers translated for practitioners -- Cascade (cross-stack CVE+Rowhammer attacks on compound AI), LAMLAD (dual-LLM adversarial ML, 97% evasion), OpenClaw (4 vuln classes in agent frameworks)
Reddit r/MachineLearning
My Experience with Qwen 3.5 35B
Reddit r/LocalLLaMA

Cursor’s new coding model Composer 2 is here: It beats Claude Opus 4.6 but still trails GPT-5.4
VentureBeat
Qwen 3.5 122B completely falls apart at ~ 100K context
Reddit r/LocalLLaMA