AI Dubbing Overview: How AI Voiceover Technology Works

A complete introduction to the Stra.ai dubbing editor. Learn how the interface is laid out, how speakers are managed, and how AI generates voiceovers for your localized video.

Yongho Kim

Mar 30, 2026

AI Dubbing Overview: How AI Voiceover Technology Works

Contents

The layout at a glance The work area An important note on editing The speaker panel The three-dot menu per segment The timeline The Convert Tone button Adding notes What to do next

Once your AI Dubbing project finishes processing, the editor opens with a keyboard shortcuts panel in the foreground. Click anywhere to dismiss it and reveal your project. What you see from that point is significantly more powerful than the subtitle editor. This guide gives you the full lay of the land before you start editing.

The layout at a glance

The dubbing editor has four main areas working together.

The video preview sits on the left. Use it to watch your video while you review and edit the dubbed audio.

The work area sits on the right and takes up most of the screen. This is where you read, edit, and generate dubbed audio for every line of dialogue.

The speaker panel sits at the bottom of the work area and shows the currently selected segment with playback and generation controls.

The timeline runs along the very bottom of the screen and shows all audio tracks as separate channels stacked on top of each other.

The work area

The work area has three columns for every dialogue segment.

The left column is the source, the original language transcription of what was said in the video.

The middle column is the translation, the AI-generated output in your target language. This is the text the AI will read aloud when you generate the dub.

The right column is the voice directing field. This is where you give the AI a performance instruction for that specific line. For example you might write "Speak calmly and confidently" or "Angry but controlled" or "Excited, building energy." This works regardless of whether you selected ElevenLabs or Gemini TTS during project setup.

Each row also has a speaker number on the left side, showing which speaker that line belongs to. The AI assigns these automatically based on the voice separation it did during processing.

An important note on editing

You can edit both the source column and the translation column. But be careful.

If you edit the source and then use the arrow button to retranslate, the translation will update based on your edited source. This is useful when the original transcription was wrong and the translation came out incorrect as a result.

However if you already have a translation you are happy with, do not touch the source. Editing the source and retranslating will overwrite your approved translation. When in doubt, edit the translation column directly and leave the source alone.

The speaker panel

At the bottom of the work area you will find the speaker panel. Here is where you actually do manual adjustments to your translations. When you select any dialogue segment in the work area, the panel updates to show that segment and highlights the speaker number it belongs to.

From the speaker panel you can play back just the dubbed audio for that segment without playing the whole video. You can also generate a new dub for that single segment using the Generate dub button.

If you want to generate multiple segments at once, select them in the work area and use the Generate selected button in the top right corner of the work area.

Next to the Generate dub button you will see numbered buttons labeled 0, 1, 2, and 3. These are the speakers of the scene. You can manage the number of speakers if you click on the icon next to the numbers. You can generate up to five speakers per video and switch between them to find the best performance before moving on. Each speaker will have its own audio track on the timeline below.

Clicking the three dots next to the Generate dub button opens a menu with these options:

Download saves only that segment's audio to your computer, useful for spot checking individual lines outside the editor.

History shows the modification history for that segment.

Re-transcribe this segment asks the AI to listen to the original audio again and update the transcription and translation for that line only.

Merge with Above combines the selected segment with the one directly before it.

Merge with Below combines the selected segment with the one directly after it.

Delete removes the segment entirely.

The timeline

The timeline at the bottom of the screen shows multiple audio channels stacked vertically. Each speaker gets their ow

n channel. The BGM track shows the background music and sound effects separated from the voices. The Original Audio track shows the source voice audio before translation.

If you cannot see all the speaker channels, scroll down on the timeline. It is common for a second or third speaker track to be hidden below the visible area, especially on smaller screens.

You can resize the timeline area both horizontally and vertically using the small bar buttons on the right side of the screen, just below the Generate dub button.

The Convert Tone button

In the top right corner of the work area you will see a Convert Tone button. This applies a tone conversion across the project. More detail on this is covered in the Dubbing Work Area guide.

Adding notes

The note system is designed for internal QA and team review directly on the timeline.

The bookmark icon in the toolbar above the timeline always opens a new memo input at the current playhead position. The input shows the exact timecode, a text field to type your note, a Resolved checkbox, and Save and Delete buttons.

Once you save a note, a number badge appears next to the bookmark icon showing how many memos exist. If there are no notes yet, it shows three dots. Clicking the number or the three dots opens the full Notes panel which lists all memos with their timecodes and text, shows separate counters for open and resolved notes, and includes an Add note at current position button at the bottom.

Checking the Resolved checkbox on any note crosses it out with a strikethrough but does not delete it. Resolved memos stay visible in the panel so you have a full audit trail of what was flagged and fixed.

What to do next

You have the full picture of the dubbing editor. The next guides go deeper into each part of the workflow.

To learn how to work with the timeline and multiple speaker tracks, go to The Dubbing Timeline: Managing Multiple Speakers and Audio Tracks
To learn how to fine-tune voice tone and emotion, go to The Dubbing Work Area: Fine-tuning AI Voice Tones and Emotions
To export your finished dubbed video, go to High-Fidelity Export: Downloading Dubbed MP4s and Clean Audio Tracks
To see all keyboard shortcuts in one place, go to Workflow Hacks: Essential Shortcuts for AI Dubbing Projects

Continue here: The Dubbing Timeline: Managing Multiple Speakers and Audio Tracks

Contents

The layout at a glance The work area An important note on editing The speaker panel The three-dot menu per segment The timeline The Convert Tone button Adding notes What to do next