This is a submission for the Google Cloud NEXT Writing Challenge
Google Cloud NEXT '26 quietly introduced a problem most developers are not ready for.
Your system no longer fails because of a bug.
It fails because an agent made a reasonable decision that turned out to be wrong.
That difference sounds subtle.
It isn’t.
Trust me, this is such a pain in the butt, I'm saying this coz I worked for both Gemini 3 Hackathon and Gemini Live Agent Challenge, and I know how easy it is to fall into such traps.
This article walks through that shift using what Google actually demonstrated on stage:
- how the Agent Development Kit (ADK) changes development
- how multi-agent systems behave in production
- and how Gemini Cloud Assist becomes your debugging layer
Code Writing Code, and Code Acting on it
The keynote doesn't begin with infrastructure or APIs. It starts with something more unsettling.
Music is generated using AI. Visuals are rendered live.
And those visuals? Generated by code that Gemini writes in real time based on audio input.
This is the pattern the rest of the keynote follows, but more importantly, it's the pattern we now have to debug.
You can check the visuals at the start till 02:00. These were created using Veo, Nano Banana, Gemini Flash Live and everything is done using Music AI Sandbox
ADK: You're Not Writing Logic Anymore
At the center of everything is the Agent Development Kit (ADK).
At first glance, it looks like just another framework. But it changes something fundamental: You don't define how things happen anymore.
You define:
- what the agent is supposed to do
- what tools it has access to
- what knowledge it can use
And then… you let it decide.
During the keynote, Richard and Emma builds a Marathon Planner Agent. Not a function. Not a service. An agent.
It is given:
instructions (plan a marathon route)
tools (Google Maps via MCP)
skills (GIS logic, race planning rules)
From there, it figures things out.
No explicit control flow. No step-by-step orchestration.
The Subtle but Dangerous Shift
In a normal system, if something goes wrong, you know where to look. In an ADK-based system:
- The agent may choose the wrong tool
- or use the right tool incorrectly
- or interpret the prompt differently
- or combine context in unexpected ways
- or a whole new level of problem that we haven't yet figured out
Nothing is strictly "broken". It just… behaves incorrectly.
When One Agent Isn't Enough
The demo quickly evolves beyond a single agent. Instead of forcing one agent to do everything, they split responsibilities:
- a Planner Agent proposes routes
- an Evaluator Agent scores them
- a Simulator Agent runs the world
This is where things start to look less like software and more like a system of collaborators. These agents don't call APIs directly. They discover each other.
Google introduces:
- A2A (Agent-to-Agent protocol) => how agents communicate
- Agent Registry => how agents find each other
Think of it as DNS for agents.
The Most Underrated Feature: Agents Build Their Own UI
One of the most interesting moments in the keynote is easy to miss.
The UI isn't manually built. The agent generates it. Using something called A2UI, the agent:
- decides how results should be displayed
- constructs components
- renders them dynamically
This removes an entire layer of development.
Context Engineering Is Where Systems Break
As the system evolves, more data is introduced:
- city regulations
- traffic constraints
- historical patterns
This is handled through:
- sessions (state across interactions)
- memory (long-term knowledge)
- RAG (retrieval from databases)
The agent starts behaving more intelligently.
It also becomes far more fragile. At one point, the agent learns: "You can't have a camel on public roads"
Funny in isolation. Critical when that rule influences route planning.
Debugging Stops Being Mechanical
In a traditional system, you would:
- check logs
- inspect stack traces
- fix the code
Here, none of that is sufficient. You need to answer:
- why did the agent choose this tool?
- why did it carry this context forward?
- why did memory grow uncontrollably?
That's not debugging code. That's debugging reasoning.
Gemini Cloud Assist: The Real Innovation
Google's answer is not better logs. It's an AI system that debugs your AI system. Gemini Cloud Assist acts as:
- investigator
- debugger
- infra operator
- code assistant
When the failure happens, it:
- analyzes logs
- inspects traces
- reads your code
- correlates infra issues
- identifies root cause
And then it suggests a fix.
What Actually Broke?
The root cause in the demo:
- context grew too large
- exceeded Gemini's token limit
- event compaction wasn't frequent enough
The fix wasn't a rewrite. It was a behavioral adjustment:
- compress context more frequently
- reduce memory footprint per step
Everything is fine.
Now, if you think I'm gonna leave you hanging after all these intro...
So far we've seen what it can do, now it’s time to use it
So far, everything we discussed lives in the keynote.
Cool demos. Fancy systems. "Wow, agents!"
But none of that matters unless we can actually build something that behaves like that.
So instead of jumping straight into "multi-agent, cloud-native, distributed magic"… we start small. Controlled. Understandable.
We build a system where:
- an agent makes a decision
- that decision actually affects something real
- and we can see the impact visually
Step 1: Define the World
Before bringing Gemini into the picture, I need a system that can react to decisions.
So I'll build a simple simulation:
- a route (sequence of coordinates)
- runners moving along that route
- a visualization of their positions over time
At this stage, everything is deterministic.
Then convert this into a dense path:
Each runner:
- moves at a slightly different speed
- has small randomness
- doesn’t perfectly overlap with others
This gives us something that already looks like a race.
Step 2: Bring in Gemini
Now comes the important part. We don’t ask Gemini to generate coordinates.
That’s a trap.
Instead, we constrain it. We define a few route templates:
Now Gemini’s job is simple: Pick the type of route.
Step 3: The Planner Agent
Notice what we did here:
- limited output space
- avoided parsing nightmares
- kept the system predictable
This is exactly how you should use LLMs in systems.
Step 4: Connect Decision => Behavior
What You’re Actually Seeing
It represents:
- position => where runners are
- color => how far they’ve progressed
- shape => the route chosen by Gemini
Change the prompt, and the route changes. Change the route, and the entire distribution changes.
Step 5: When It Broke (and Nothing Looked Broken)
At some point, the system started behaving… oddly.
Gemini consistently chose curved routes, even when the prompt clearly favored straight ones.
Nothing failed.
No exceptions.
No crashes.
No warnings.
The simulation ran perfectly. But the output distribution was wrong.
At first, it looked like randomness. Then it looked like bias. Eventually, it became clear: the model was over-weighting certain keywords in the prompt and mapping them incorrectly to route templates.
The problem wasn’t in the simulation.
It wasn’t in the data.
It was in how the agent interpreted intent.
Debugging this felt very different from normal debugging:
There was no single place to look
no clear cause-and-effect chain
The only behavior that emerged over multiple runs
The fix wasn’t a code change.
It was:
- tightening the prompt
- reducing ambiguity
- making output constraints stricter
The system didn’t become “correct”.
It became less wrong.
That’s the mindset shift with non-deterministic systems: In non-deterministic systems, correctness isn’t a state.
It’s a range you try to keep within acceptable bounds.
Why This Matters
At this point, Gemini is not "doing everything". It’s doing something more important:
It decides the conditions under which the system runs.
That’s the shift.
We’ve moved from static code controlling behavior to AI influencing system dynamics
What You Just Did
You didn't debug code.
You debugged behavior.
You constrained decision space.
You shaped how the agent interprets intent.
You reduced how wrong the system can be.
That’s a fundamentally different skill. Because in these systems, correctness is not guaranteed. It is negotiated.
Note: This isn’t meant to match the keynote. It’s a minimal example showing a bigger idea: shifting from writing fixed logic to building systems that decide how to behave at runtime.
Final Takeaway
Google didn't just launch tools. It revealed a shift:
Software is no longer deterministic execution
It is probabilistic decision-making
And that means:
- debugging is harder
- observability is critical
- architecture matters more than ever
- Closing Thought
The hardest bug in the future isn't:
"Why did this fail?"
It’s:
"Why did the system think this was correct?"
Because we didn’t just make software more powerful.
We made it capable of being wrong in far more complex ways.
Waiting for the day a hotfix pops up: “Fix the AI pipeline” 😂. Thankfully, we're on Google's stack, so at least I'll have the right tools when it happens.
















