Retrieval-Augmented Reasoning for Chartered Accountancy

arXiv cs.AI / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The article argues that while LLMs are increasingly used in finance, they remain unreliable for complex, jurisdiction-specific work such as Indian Chartered Accountancy due to multi-step numerical reasoning and regulatory knowledge gaps.
  • It introduces CA-ThinkFlow, a parameter-efficient RAG framework that combines a 14B 4-bit-quantized reasoning model (14B-DeepSeek-R1) with a layout-aware Docling-based document extraction system to preserve document structure.
  • CA-ThinkFlow uses a simple RAG approach that injects retrieved content into prompts and leverages the model’s built-in Chain-of-Thought to generate answers.
  • In evaluation on the multi-level CA-Ben benchmark, the framework achieves Scholastic Reliability Coefficient (SRC) performance matching large proprietary models, reaching 68.75% of GPT-4o and Claude 3.5 Sonnet.
  • The authors note limitations: despite strong efficiency and parameter handling, the system can still struggle with essential reasoning when processing complex regulatory texts, such as those found in Taxation.
  • The work is presented as an arXiv preprint (v1), indicating early-stage research rather than a finalized deployed product.

Abstract

The inception of Large Language Models (LLMs) has catalyzed AI adoption in the finance sector, yet their reliability in complex, jurisdiction-specific tasks like Indian Chartered Accountancy (CA) remains limited. The models display difficulty in executing numerical tasks which require multiple steps while also needing advanced knowledge about legal regulations and the method of scaling their operations is not feasible in settings which have limited access to resources. We present CA-ThinkFlow as a parameter-efficient Retrieval-Augmented Generation (RAG) framework which operates with a 14B, 4-bit-quantized reasoning model, 14B-DeepSeek-R1, and a layout-aware Docling extraction system which maintains document structure during extraction. CA-ThinkFlow uses a basic RAG method which automatically adds retrieved information into the prompt, while it depends on the model's built-in Chain-of-Thought (CoT) functions to create context and produce correct answers. The system we developed system operates at performance levels which match large proprietary models when we tested it on the multi-level CA-Ben benchmark, achieving Scholastic Reliability Coefficient (SRC) results which equal 68.75\% of GPT-4o and Claude 3.5 Sonnet. The framework shows high efficiency and strength in handling parameters, but essential reasoning abilities fail to process complex regulatory texts which exist in fields such as Taxation.