| Hey everyone, I’m struggling to find a good approach for converting PDFs to Markdown (especially for financial data). The main challenge is handling borderless tables and tables with more than 5–6 columns. I’ve tried docling, graphite-docling, marker, etc., but haven’t found a solid open-source solution. The only thing that works well so far is LandingAI (but it’s paid). Does anyone know of a good open-source alternative? TIA! Sample: [link] [comments] |
Why Is Table Extraction with VLM Models Still Challenging? [D]
Reddit r/MachineLearning / 5/1/2026
💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage
Key Points
- The post asks for a reliable open-source method to convert PDFs into Markdown with an emphasis on accurately extracting financial tables.
- The author reports that borderless tables and tables with more than about 5–6 columns are especially difficult to handle with existing approaches.
- They have tried several tools and pipelines (including Docling, Graphite-Docling, and Marker) but have not found a consistently solid open-source solution.
- The only approach that works well so far is LandingAI, but it is paid, motivating a search for alternatives.
- The request includes example images and seeks community recommendations for open-source tooling or workflows that better support complex table layouts using VLM-style extraction.
Related Articles

Black Hat USA
AI Business

I deployed AI agents across AWS, GCP, and Azure without a VPN. Here is how it works.
Dev.to

Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia
Dev.to
AI made learning fun again
Dev.to

Every Telegram conversation becomes a qualified lead. BizNode captures name, email, and business details automatically while...
Dev.to