| Model Summary: Granite-4.0-3B-Vision is a vision-language model (VLM) designed for enterprise-grade document data extraction. It focuses on specialized, complex extraction tasks that ultracompact models often struggle with:
The model is delivered as a LoRA adapter on top of Granite 4.0 Micro, enabling a single deployment to support both multimodal document understanding and text-only workloads — the base model handles text-only requests without loading the adapter. See Model Architecture for details. While our focus is on specialized document extraction tasks, the current model preserves and extends the capabilities of Granite-Vision-3.3 2B, ensuring that existing users can adopt it seamlessly with no changes to their workflow. It continues to support vision‑language tasks such as producing detailed natural‑language descriptions from images (image‑to‑text). The model can be used standalone and integrates seamlessly with Docling to enhance document processing pipelines with deep visual understanding capabilities. [link] [comments] |
ibm-granite/granite-4.0-3b-vision · Hugging Face
Reddit r/LocalLLaMA / 3/29/2026
📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- IBM’s Granite-4.0-3B-Vision is a vision-language model tailored for enterprise document extraction, emphasizing chart, table, and semantic key-value pair (KVP) extraction from document images.
- It is released on Hugging Face as a LoRA adapter built on top of the Granite 4.0 Micro base model, allowing the same deployment to handle both multimodal document understanding (with the adapter) and text-only workloads (without loading the adapter).
- The model supports structured outputs for charts (e.g., Chart2CSV/Chart2Summary/Chart2Code) and table extraction into formats such as JSON, HTML, or OTSL.
- It aims to preserve and extend capabilities from Granite-Vision-3.3 2B for seamless adoption, while also supporting general vision-language tasks like image-to-text.
- The model can be used standalone and integrates with the Docling pipeline to enhance document processing with deeper visual understanding.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.




