Debugging AI Agents in Production: ADK+Gemini Cloud Assist | Google Cloud NEXT '26

Dev.to / 4/25/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The article argues that production failures in AI-agent systems are increasingly caused by plausible but incorrect agent decisions rather than straightforward software bugs.
It explains how Google’s Agent Development Kit (ADK) shifts developers away from explicitly writing logic toward defining the agent’s goals, tools, and knowledge, letting the agent decide execution.
Using a Marathon Planner Agent example, the piece describes how agents combine instructions with tool access (e.g., Google Maps via MCP) and domain skills (e.g., GIS logic).
It highlights that multi-agent behavior complicates debugging because interactions can lead to unintended outcomes even when each step appears reasonable.
It presents Gemini Cloud Assist as a debugging layer to help developers diagnose and troubleshoot these agent-driven issues in production.

This is a submission for the Google Cloud NEXT Writing Challenge

Google Cloud NEXT '26 quietly introduced a problem most developers are not ready for.

Your system no longer fails because of a bug.
It fails because an agent made a reasonable decision that turned out to be wrong.

That difference sounds subtle.
It isn’t.

Trust me, this is such a pain in the butt, I'm saying this coz I worked for both Gemini 3 Hackathon and Gemini Live Agent Challenge, and I know how easy it is to fall into such traps.

This article walks through that shift using what Google actually demonstrated on stage:

how the Agent Development Kit (ADK) changes development
how multi-agent systems behave in production
and how Gemini Cloud Assist becomes your debugging layer

Code Writing Code, and Code Acting on it

The keynote doesn't begin with infrastructure or APIs. It starts with something more unsettling.

Music is generated using AI. Visuals are rendered live.
And those visuals? Generated by code that Gemini writes in real time based on audio input.

This is the pattern the rest of the keynote follows, but more importantly, it's the pattern we now have to debug.

You can check the visuals at the start till 02:00. These were created using Veo, Nano Banana, Gemini Flash Live and everything is done using Music AI Sandbox

ADK: You're Not Writing Logic Anymore

At the center of everything is the Agent Development Kit (ADK).

At first glance, it looks like just another framework. But it changes something fundamental: You don't define how things happen anymore.

You define:

what the agent is supposed to do
what tools it has access to
what knowledge it can use

And then… you let it decide.

During the keynote, Richard and Emma builds a Marathon Planner Agent. Not a function. Not a service. An agent.

It is given:

instructions (plan a marathon route)
tools (Google Maps via MCP)
skills (GIS logic, race planning rules)

From there, it figures things out.

No explicit control flow. No step-by-step orchestration.

The Subtle but Dangerous Shift

In a normal system, if something goes wrong, you know where to look. In an ADK-based system:

The agent may choose the wrong tool
or use the right tool incorrectly
or interpret the prompt differently
or combine context in unexpected ways
or a whole new level of problem that we haven't yet figured out

Nothing is strictly "broken". It just… behaves incorrectly.

When One Agent Isn't Enough

The demo quickly evolves beyond a single agent. Instead of forcing one agent to do everything, they split responsibilities:

a Planner Agent proposes routes
an Evaluator Agent scores them
a Simulator Agent runs the world

This is where things start to look less like software and more like a system of collaborators. These agents don't call APIs directly. They discover each other.

Google introduces:

A2A (Agent-to-Agent protocol) => how agents communicate
Agent Registry => how agents find each other

Think of it as DNS for agents.

The Most Underrated Feature: Agents Build Their Own UI

One of the most interesting moments in the keynote is easy to miss.

The UI isn't manually built. The agent generates it. Using something called A2UI, the agent:

decides how results should be displayed
constructs components
renders them dynamically

This removes an entire layer of development.

Context Engineering Is Where Systems Break

As the system evolves, more data is introduced:

city regulations
traffic constraints
historical patterns

This is handled through:

sessions (state across interactions)
memory (long-term knowledge)
RAG (retrieval from databases)

The agent starts behaving more intelligently.

It also becomes far more fragile. At one point, the agent learns: "You can't have a camel on public roads"

Funny in isolation. Critical when that rule influences route planning.

Debugging Stops Being Mechanical

In a traditional system, you would:

check logs
inspect stack traces
fix the code

Here, none of that is sufficient. You need to answer:

why did the agent choose this tool?
why did it carry this context forward?
why did memory grow uncontrollably?

That's not debugging code. That's debugging reasoning.

Gemini Cloud Assist: The Real Innovation

Google's answer is not better logs. It's an AI system that debugs your AI system. Gemini Cloud Assist acts as:

investigator
debugger
infra operator
code assistant

When the failure happens, it:

analyzes logs
inspects traces
reads your code
correlates infra issues
identifies root cause

And then it suggests a fix.

What Actually Broke?

The root cause in the demo:

context grew too large
exceeded Gemini's token limit
event compaction wasn't frequent enough

The fix wasn't a rewrite. It was a behavioral adjustment:

compress context more frequently
reduce memory footprint per step

Everything is fine.

Now, if you think I'm gonna leave you hanging after all these intro...

So far we've seen what it can do, now it’s time to use it

So far, everything we discussed lives in the keynote.

Cool demos. Fancy systems. "Wow, agents!"

But none of that matters unless we can actually build something that behaves like that.

So instead of jumping straight into "multi-agent, cloud-native, distributed magic"… we start small. Controlled. Understandable.

We build a system where:

an agent makes a decision
that decision actually affects something real
and we can see the impact visually

Step 1: Define the World

Before bringing Gemini into the picture, I need a system that can react to decisions.

So I'll build a simple simulation:

a route (sequence of coordinates)
runners moving along that route
a visualization of their positions over time

At this stage, everything is deterministic.

Then convert this into a dense path:

And simulate runners:

Each runner:

moves at a slightly different speed
has small randomness
doesn’t perfectly overlap with others

This gives us something that already looks like a race.

Step 2: Bring in Gemini

Now comes the important part. We don’t ask Gemini to generate coordinates.
That’s a trap.

Instead, we constrain it. We define a few route templates:

Now Gemini’s job is simple: Pick the type of route.

Step 3: The Planner Agent

Notice what we did here:

limited output space
avoided parsing nightmares
kept the system predictable

This is exactly how you should use LLMs in systems.

Step 4: Connect Decision => Behavior

Now wire everything together:

What You’re Actually Seeing

It represents:

position => where runners are
color => how far they’ve progressed
shape => the route chosen by Gemini

Change the prompt, and the route changes. Change the route, and the entire distribution changes.

Curved path selection:

Step 5: When It Broke (and Nothing Looked Broken)

At some point, the system started behaving… oddly.

Gemini consistently chose curved routes, even when the prompt clearly favored straight ones.

Nothing failed.

No exceptions.
No crashes.
No warnings.

The simulation ran perfectly. But the output distribution was wrong.

At first, it looked like randomness. Then it looked like bias. Eventually, it became clear: the model was over-weighting certain keywords in the prompt and mapping them incorrectly to route templates.

The problem wasn’t in the simulation.
It wasn’t in the data.
It was in how the agent interpreted intent.

Debugging this felt very different from normal debugging:

There was no single place to look

no clear cause-and-effect chain

The only behavior that emerged over multiple runs

The fix wasn’t a code change.

It was:

tightening the prompt
reducing ambiguity
making output constraints stricter

The system didn’t become “correct”.
It became less wrong.

That’s the mindset shift with non-deterministic systems: In non-deterministic systems, correctness isn’t a state.
It’s a range you try to keep within acceptable bounds.

Why This Matters

At this point, Gemini is not "doing everything". It’s doing something more important:

It decides the conditions under which the system runs.

That’s the shift.

We’ve moved from static code controlling behavior to AI influencing system dynamics

What You Just Did

You didn't debug code.

You debugged behavior.

You constrained decision space.
You shaped how the agent interprets intent.
You reduced how wrong the system can be.

That’s a fundamentally different skill. Because in these systems, correctness is not guaranteed. It is negotiated.

Note: This isn’t meant to match the keynote. It’s a minimal example showing a bigger idea: shifting from writing fixed logic to building systems that decide how to behave at runtime.

Final Takeaway

Google didn't just launch tools. It revealed a shift:

Software is no longer deterministic execution
It is probabilistic decision-making

And that means:

debugging is harder
observability is critical
architecture matters more than ever
Closing Thought

The hardest bug in the future isn't:
"Why did this fail?"
It’s:
"Why did the system think this was correct?"
Because we didn’t just make software more powerful.
We made it capable of being wrong in far more complex ways.

Waiting for the day a hotfix pops up: “Fix the AI pipeline” 😂. Thankfully, we're on Google's stack, so at least I'll have the right tools when it happens.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/25DailyView insight →

Black Hat USA

AI Business

Navigating WooCommerce AI Integrations: Lessons for Agencies & Developers from a Bluehost Conflict

Dev.to

One Day in Shenzhen, Seen Through an AI's Eyes