Chatgpt vs purpose built ai for cre underwriting: which one can finish the job?

Reddit r/artificial / 4/2/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageIndustry & Market Moves

Key Points

  • The author argues that ChatGPT is not reliable for multifamily CRE underwriting because it produces fragmented outputs (e.g., partial formulas and disconnected tables) that can’t be delivered as a complete investment-committee-ready workbook.
  • In their month-long testing, they found that multiple prompt iterations still required time comparable to building the model in Excel, with the added burden of debugging hallucinated or incorrect spreadsheet components.
  • A key limitation cited is that ChatGPT does not maintain consistent state across complex, multi-step modeling workflows, breaking the coherence needed from assumptions → cash flows → returns → sensitivities.
  • The post contrasts this with purpose-built AI tools that are designed to decompose the task, run autonomously for minutes, validate intermediate outputs, and output a finished workbook populated with real Excel formulas.
  • The conclusion is that LLM chat is useful for brainstorming and quick questions, but architecture/design differences make purpose-built systems better when the AI output is itself the required deliverable.

I keep seeing people recommend chatgpt for financial modeling and I need to push back because I spent a month testing it for multifamily underwriting and the results were not close to usable.

Pasting rent rolls, T12s, operating statements and asking it to build models, you get fragments. A few formulas, a cash flow table, maybe a cap rate calculation. Nothing ties together into a workbook you could hand to an investment committee. Fifteen rounds of prompting later and you've spent the same time you would have just building it in excel, except now you also have to debug whatever chatgpt hallucinated in cell D47.

Problem with chatgpt is that it doesn't maintain state across a complex multi-step task. It treats each prompt like a fresh conversation even in the same thread. An underwriting model where assumptions feed cash flows which feed returns which feed sensitivities requires coherence across all those layers and it fragments.

Purpose-built tools are architecturally different. They decompose the task, run autonomously for 15 to 30 minutes, check intermediate outputs, return a complete workbook with actual excel formulas. That's not a model quality difference, that's a design philosophy difference.

Chatgpt for quick questions and brainstorming, yes. For anything where the output IS the deliverable, no. Different architecture for different jobs.

submitted by /u/MudSad6268
[link] [comments]