Car Wash Mystery solved--Tool Call Degrades Intelligence.

Reddit r/LocalLLaMA / 4/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Read original →

共有:

Key Points

The author reports experiments with Kimi-k2.5 showing that using tool-calling (with web search + Python in a Docker sandbox) reduces accuracy on a simple “car wash” decision question compared with no-tools prompting.
Three modes were tested (no tools, XML pseudo-tools, and JSON schema tools), and correct answers dropped progressively when tools were enabled (3/3, 2/3, and 1/3 respectively).
A follow-up chemistry question produced the same pattern: the model knew the answer in no-tools mode but failed when tool schemas were present, suggesting the model shifts into a “delegation mode” rather than reasoning from internal knowledge.
The conclusion is that tool schema overhead and the presence of tools can degrade intelligence for some tasks, and similar behavior was observed when testing with Qwen 3.5.
Limitations include testing only two model variants and a small sample size (three runs per mode), so results may not generalize broadly.

I asked the OG question to the kimi k2.5:

"I want to wash my car and the car wash is just 10 metres away. Should I walk or drive there?"

Kimi-k2.5 via NIM -- Three Modes.

I tested three modes: no tools, XML pseudo-tools, and JSON schema tools. "Tools" here means web search + Python in a Docker sandbox. 3 tests were conducted in each mode.

Mode	Correct (Drive)
No tools	3/3 ✅
XML pseudo-tools	2/3
JSON schema tools	1/3

tool overhead seems to degrade intelligence

Confirming with a Chemistry Question

To double check, I ran one more test --this time a niche chemistry question.

Background: diatomic molecules with even electron counts are generally diamagnetic, with two standard exceptions (10e and 16e systems). There's a lesser-known extension-- the entire oxygen family (O₂, S₂, Se₂, Te₂...) are all paramagnetic, not just O₂.

I asked:

"I remember for finding whether a compound is para or diamagnetic we used the odd even electron rule, but there were 2 exceptions, 10 and 16 electrons. Are there any more exceptions?"

Mode	Result
No tools	✅ Correctly identified O₂ family -- S₂, Se₂, Te₂ all paramagnetic
XML pseudo-tools	answered- "No more exceptions to remember" , this is failure ofc.
JSON schema tools	Similar failure

Conclusion

The model had the correct answer in both cases --it just couldn't access it when tools were present. Tool schemas seem to push the model into "delegation mode" where it looks for something to search or execute rather than reasoning from its own knowledge. No tools = full attention on the problem.

i tested car wash test with qwen 3.5 also and found success in no tool mode and failure in tool mode.

Limitations

Only tested on Kimi-k2.5, qwen 3.5
3 runs per mode is a small sample

submitted by /u/Spirited_Neck1858
[link] [comments]

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

AI agents have no identity — we built the open registry that gives them one

Dev.to

Democratic Governance of AI Is the Real Solution

Reddit r/artificial

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Claude Desktop Now Supports Third-Party APIs — Here's How to Set It Up

Dev.to

Car Wash Mystery solved--Tool Call Degrades Intelligence.

Key Points

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

AI agents have no identity — we built the open registry that gives them one

Democratic Governance of AI Is the Real Solution

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Claude Desktop Now Supports Third-Party APIs — Here's How to Set It Up

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer