共有:

Live Translate

The words of a meeting are no longer
something you read as subtitles.

Until now, Google Meet's translation was "subtitles that appear at the bottom of the screen." Now it turns the speaking voice itself into a voice in another language—interpreting it a few seconds behind, before the phrase is even finished. What Gemini 3.5 Live Translate changes is the choice, in a foreign-language meeting, between "reading" and "listening."

AI Navigate Editorial·2026.06.10·6 min read

The Gap

The gulf that lay between
"subtitles" and "interpretation"

Both Google Meet and Google Translate already had translation features. But those were subtitles based on machine translation—transcribing what the other person said, translating it into another language, and displaying it on screen. Read it and you grasp the meaning. Yet your gaze is tied to the bottom of the screen, and the rhythm of conversation breaks.

When Gemini Omni and 3.5 Flash were announced at Google I/O 2026 three months ago, voice-to-voice "simultaneous interpretation" remained a preview-stage promise, with no fixed date for full availability. Real-time interpretation that preserved the pace of speech and the tone of voice was something no service had achieved.

Subtitle translation until now	Gemini 3.5 Live Translate
Transcribes speech, then displays the translation	Turns the voice directly into a voice in another language
You must follow the on-screen subtitles with your eyes	You can keep the conversation going while listening
Tone and intonation of the voice are lost	Preserves the speaker's tone, speed, and pitch
The translation appears after the sentence ends	Generated continuously a few seconds behind, before you finish speaking

Stop reading the words.
Hear them in another language, in the speaker's own voice.

How It Works

It doesn't wait for you to finish

Conventional interpretation features started translating only "after a sentence was complete." Live Translate is streaming-based, beginning to build the translation while you are still speaking.

FIG. Without waiting for a sentence to complete, it streams interpretation phrase by phrase, a few seconds behind.

It picks up the start of speech

From the moment the speaker opens their mouth, it takes in the audio phrase by phrase. Rather than "sequential translation" that waits for a sentence to end, it is a streaming approach that processes the incoming audio as it flows.

Into another language, voice and all

Instead of transcribing it into text to be read aloud, it converts directly from voice to voice. Because it preserves the speaker's tone, speed, and pitch in doing so, the translated voice keeps the nuance of the original delivery.

It follows a few seconds behind

Because it doesn't wait for you to finish, the interpretation keeps following the speaker a few seconds behind. The rhythm of conversation isn't interrupted, and you can exchange a foreign language with something close to the feel of in-person simultaneous interpretation.

Where It Lands

It enters meetings,
learning, and your apps

The same Live Translate model can now be used from three entry points.

FIG. One interpretation model, delivered via three routes: meetings, the translation app, and a developer API.

70+

Supported languages

Voice→voice

Output as voice, not subtitles

A few seconds

Lag while following the speaker

This time, as Gemini 3.5 Live Translate, a streaming voice-to-voice translation model supporting more than 70 languages has been built into Google Meet, Google Translate, and the Gemini Live API. What makes this release distinctive is that the same model landed at once in three entry points of different character: a meeting tool, a translation app, and a developer-facing API.

Those using it in meetings can listen to the other person through the interpreter's voice instead of reading subtitles. In a language-learning context, the experience of having English you spoke come right back in a Japanese voice can now be tried in the Gemini app. And developers, by going through the Gemini Live API, can embed the same interpretation feature into their own apps.

In Practice

From this week, you can choose

No special preparation is needed. It appears as a new option right inside the tools you already use.

In meetings with overseas offices

If you use Google Meet, from this week you can choose a meeting with voice interpretation built in, rather than subtitles. Your gaze isn't tied to the bottom of the screen, and you can focus on the conversation.

As a language-learning partner

You can try the experience of having English you spoke come right back in a Japanese voice, in the Gemini app. Because it's translated while preserving your pronunciation and pace, the feel of conversation practice changes.

Embed it in your own app

Via the Gemini Live API, you can embed the interpretation feature into your own service. Whether a meeting app or a learning app, you can drop in the same voice interpretation as is.

Caveats

"Supported" and "practical"
are two different things

What's worth noting is that 70-language support is, at the end of the day, a matter of coverage (how broad the support is). It doesn't guarantee the same quality across every language pair as between Japanese and English. The accuracy for non-English language pairs varies with the combination you use.

In particular, entrusting an important negotiation—where contract terms and figures are at stake—to voice interpretation alone is overconfidence. It's more realistic to treat it as a tool for getting the rhythm of conversation back, and as one more move that adds the option of "listening interpretation" alongside "reading interpretation." In situations that require confirmation, it's safer to combine it with subtitles or human verification.

The gulf that lay between"subtitles" and "interpretation"