Researchers ran 25,000 AI scientist experiments and discovered something that need attention!!
AI scientists are producing results without doing science.
68% of times, the AI gathered evidence and then completely ignored it. 71% times the AI never updated its beliefs at all. Not once. Only 26% of the time did the AI revise a hypothesis when confronted with contradictory data.
A human scientist adapts. You approach a chemistry identification problem differently than you approach a simulation workflow. The AI doesn't. It runs the same undisciplined loop every time.
The researchers also showed the most popular proposed fix: better scaffolding do not work.
Everyone building AI research agents has focused on engineering better prompting frameworks, better tool routing, better agent architectures. ReAct, structured tool-calling, chain-of-thought, all of it.
[link] [comments]