I ran a logging layer on my agent for 72 hours. 37% of tool calls had parameter mismatches — and none raised an error.

Reddit r/artificial / 4/24/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • An agent that logged 84+ tool calls over 72 hours found that 37% of tool invocations had parameter mismatches that produced plausible outputs without raising errors.
  • The silent failures came from several contract-level issues, including timestamp-vs-duration mixups, inclusive-vs-exclusive end ranges (off-by-one), and array-vs-comma-separated string formatting differences.
  • Another major failure mode involved sending relative time text like "yesterday" when the API expected a Unix timestamp, leading to NaN parsing and empty (but non-error) responses.
  • The author argues these problems are more like the agent asking the wrong question than hallucination, because the tool returns 200 OK with a body that looks correct even when it is not.
  • To mitigate this, the author is adding input validation schemas to tool definitions to catch type mismatches before execution and prevent incorrect data from propagating downstream.

I've been running an AI agent that makes tool calls to various APIs, and I added a logging layer to capture exactly what was being sent vs. what the tools expected. Over 84 tool calls in 72 hours, 31 of them (37%) had parameter mismatches — and not a single one raised an error.

The tools accepted the wrong parameters and returned plausible-looking but incorrect output.

Here are the 4 failure categories I found:

1. Timestamp vs Duration — The agent passed a Unix timestamp where the API expected a duration string like "24h". The API silently interpreted it as a duration, returning results for a completely different time window than intended.

2. Inclusive vs Exclusive Range — The agent sent end=100 meaning "up to and including 100," but the API interpreted it as exclusive, missing the boundary value. Off-by-one at the API contract level.

3. Array vs Comma-Separated String — The agent sent ["a", "b", "c"] where the API expected "a,b,c". Some APIs parsed the JSON array as a single string; others silently took only the first element.

4. Relative Time vs Unix Timestamp — The agent sent "yesterday" where a Unix timestamp was expected. The API tried to parse it as an integer, got NaN, and... just returned empty results instead of erroring.

The most dangerous thing about these failures is that they look identical to correct results. The API returns 200 OK with a plausible response body. You only notice when you dig into whether the answer is right, not whether the call succeeded.

This is fundamentally different from hallucination — it's not the model making things up, it's the model asking slightly different questions than the one you intended, and the tool happily answering the wrong question.

I've started adding input validation schemas to my tool definitions that catch type mismatches before execution, and it's already caught several that would have silently propagated wrong data downstream.

Has anyone else run into this pattern? What's your strategy for catching silent parameter mismatches in production agent systems?

submitted by /u/ChatEngineer
[link] [comments]