TableSeq: Unified Generation of Structure, Content, and Layout
arXiv cs.CV / 4/20/2026
📰 NewsModels & Research
Key Points
- TableSeq is an image-only, end-to-end framework that unifies table structure recognition, cell content recognition, and cell localization into a single autoregressive sequence-generation task.
- The model generates an interleaved stream containing HTML tags, cell text, and discretized coordinate tokens, aligning logical structure, content, and geometry without external OCR or multi-stage post-processing.
- TableSeq uses a lightweight high-resolution FCN-H16 encoder, a minimal structure-prior head, and a compact transformer encoder to keep the architecture simple while maintaining strong performance on difficult layouts.
- Reported benchmark results show competitive or state-of-the-art accuracy across PubTabNet, FinTabNet, and SciTSR (CAR protocol), while also performing well on PubTables-1M (GriTS).
- The same unified sequence interface generalizes to index-based table querying and supports faster inference via multi-token prediction for blockwise decoding, with only limited accuracy loss; the project plans to release code publicly on GitHub.
Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)
Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI
Dev.to
Local LLM Beginner’s Guide (Mac - Apple Silicon)
Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals
Dev.to

Space now with memory
Dev.to