The Dubbing Timeline: Managing Multiple Speakers and Audio Tracks

A complete guide to the Stra.ai dubbing timeline. Learn how to navigate multiple speaker tracks, add new dialogue, manage audio channels, and work with segments efficiently.

Yongho Kim

Mar 30, 2026

The Dubbing Timeline: Managing Multiple Speakers and Audio Tracks

Contents

The toolbar above the timeline The audio channels Adding a new dialogue segment The three-dot menu on timeline segments Moving segments between speaker channels Speaker channel controls Segment display and visualization What to do next

The dubbing timeline is where all the audio for your project lives. Unlike the subtitle timeline which shows one waveform, the dubbing timeline stacks multiple channels vertically, one for each speaker plus dedicated tracks for background music and original audio. This guide walks through everything you need to know to navigate it confidently.

At the top of the timeline area you will find the following controls:

Play and Pause starts and stops playback from the current playhead position.

Timestamp field shows the current playhead position. You can click it and type a specific timecode to jump directly to that point in the video. This is the fastest way to navigate to a precise moment without scrubbing.

Add segment creates a new blank dialogue segment at the current playhead position in the currently selected speaker channel. More on this below.

Cut splits whatever segment is overlapping the current playhead position into two separate segments.

Merge combines the selected segment with the next one. Note that merging only combines the text. You will need to generate the dub again after merging since the audio does not merge automatically.

Delete removes the selected segment entirely.

Notes opens a new note input at the current playhead position. See the AI Dubbing Overview for a full explanation of the notes system.

Undo and Redo step backward or forward through your edit history.

The audio channels

The timeline is divided into stacked horizontal channels. Each one represents a separate audio layer in your project.

BGM is the background music and sound effects track. This is the separated audio with all voices removed, processed during project setup.

Original Audio is the source voice track before translation. This is your reference while editing.

Speaker channels sit below these two tracks. Each speaker detected in your video gets their own channel, color coded and labeled Speaker 1, Speaker 2, Speaker 3, and so on. If you have more speakers than fit on screen, scroll down on the timeline to see the rest.

Audio exception track is a special channel labeled as Speaker 0. This is where you send original audio segments you want to preserve in the final export. More on this below.

Adding a new dialogue segment

To add a new line of dialogue that was not captured by the AI processing:

First, navigate to the area on the timeline where you want to add the new segment. Use the timestamp field to type the exact position or click directly on the timeline to move the playhead there.

Second, make sure the correct speaker is selected in the speaker panel below the work area. The new segment will be created in that speaker's channel.

Third, click the plus button in the toolbar. A new blank segment appears in the selected speaker's channel at the playhead position.

You do not need to type the original language. You can type directly in the translation column in the work area and the AI will convert that text to voice in the speaker's cloned voice. If you want to include the source language, type it in the source column and use the arrow button to translate automatically before generating the dub.

Clicking the three dots on any segment block in the timeline opens a menu with two options specific to the timeline view:

Add to dubbing audio moves that segment to the audio exception track. Use this to preserve a moment of original audio in the final export, for example a laugh, a scream, or an emotional sound that the AI cannot replicate convincingly. When you export your project, this audio will be preserved from the original source instead of being replaced by the AI voiceover.

Re-transcribe this segment asks the AI to listen to that section of audio again and generate a fresh transcription and translation. This takes only a few seconds. Once the new transcription appears you can review it and generate the dub as normal.

Moving segments between speaker channels

You can drag any dialogue segment from one speaker channel to another directly on the timeline. When you do this, the speaker assignment for that segment updates automatically to match the channel you dropped it into. Use this to fix cases if the AI assigned a line to the wrong speaker.

Speaker channel controls

Each speaker channel has controls on the left side of the timeline.

The speaker number badge identifies the channel.

Clicking S solos that track, muting all other channels so you can listen to just that speaker in isolation.

Clicking the three dots next to the channel label opens a menu with three options: Speaker settings to configure that speaker's voice, Generate track audio to regenerate all segments in that channel at once, and Download track audio to export just that speaker's audio as a file. There is also a Delete option which removes the entire track and all dialogue associated with it. A confirmation popup will appear before this action completes since it cannot be undone.

At the very bottom of the timeline channel list there is a plus button to add a new speaker channel. You can also add speakers from the speaker list in the panel below the work area.

Segment display and visualization

Each segment on the timeline shows the dialogue text, the duration of the segment in seconds, and a waveform visualization of the audio.

If a segment has no waveform and shows "No TTS" it means the audio for that segment has not been generated yet.

You can resize the waveform visualization vertically and horizontally for easier reading by dragging the resize handles. This is for display purposes only and does not affect the audio.

Making a text segment longer on the timeline does not make the audio longer. The AI reads the text at its natural pace unless you include a voice directing instruction such as "speak slowly" or "speak quickly" in the voice directing field.

Extending the text will extend its duration, you cannot extend the audio but you can shrink the box so it stops earlier.

What to do next

You now understand the full timeline. The next step is fine-tuning the voice performance for each segment before export.

To fine-tune AI voice tone and emotion, go to The Dubbing Work Area: Fine-tuning AI Voice Tones and Emotions
To export your finished dubbed video, go to High-Fidelity Export: Downloading Dubbed MP4s and Clean Audio Tracks
To go back to the full editor overview, go to AI Dubbing Overview: How AI Voiceover Technology Works
To see all keyboard shortcuts, go to Workflow Hacks: Essential Shortcuts for AI Dubbing Projects

Continue here: The Dubbing Work Area: Fine-tuning AI Voice Tones and Emotions

Contents

Tutorial

The Dubbing Timeline: Managing Multiple Speakers and Audio Tracks

A complete guide to the Stra.ai dubbing timeline. Learn how to navigate multiple speaker tracks, add new dialogue, manage audio channels, and work with segments efficiently.

Yongho Kim

Mar 30, 2026

Contents

At the top of the timeline area you will find the following controls:

Play and Pause starts and stops playback from the current playhead position.

Add segment creates a new blank dialogue segment at the current playhead position in the currently selected speaker channel. More on this below.

Cut splits whatever segment is overlapping the current playhead position into two separate segments.

Delete removes the selected segment entirely.

Notes opens a new note input at the current playhead position. See the AI Dubbing Overview for a full explanation of the notes system.

Undo and Redo step backward or forward through your edit history.

The audio channels

The timeline is divided into stacked horizontal channels. Each one represents a separate audio layer in your project.

BGM is the background music and sound effects track. This is the separated audio with all voices removed, processed during project setup.

Original Audio is the source voice track before translation. This is your reference while editing.

Audio exception track is a special channel labeled as Speaker 0. This is where you send original audio segments you want to preserve in the final export. More on this below.

Adding a new dialogue segment

To add a new line of dialogue that was not captured by the AI processing:

First, navigate to the area on the timeline where you want to add the new segment. Use the timestamp field to type the exact position or click directly on the timeline to move the playhead there.

Second, make sure the correct speaker is selected in the speaker panel below the work area. The new segment will be created in that speaker's channel.

Third, click the plus button in the toolbar. A new blank segment appears in the selected speaker's channel at the playhead position.

Clicking the three dots on any segment block in the timeline opens a menu with two options specific to the timeline view:

Moving segments between speaker channels

Speaker channel controls

Each speaker channel has controls on the left side of the timeline.

The speaker number badge identifies the channel.

Clicking S solos that track, muting all other channels so you can listen to just that speaker in isolation.

At the very bottom of the timeline channel list there is a plus button to add a new speaker channel. You can also add speakers from the speaker list in the panel below the work area.

Segment display and visualization

Each segment on the timeline shows the dialogue text, the duration of the segment in seconds, and a waveform visualization of the audio.

If a segment has no waveform and shows "No TTS" it means the audio for that segment has not been generated yet.

You can resize the waveform visualization vertically and horizontally for easier reading by dragging the resize handles. This is for display purposes only and does not affect the audio.

What to do next

You now understand the full timeline. The next step is fine-tuning the voice performance for each segment before export.

To fine-tune AI voice tone and emotion, go to The Dubbing Work Area: Fine-tuning AI Voice Tones and Emotions
To export your finished dubbed video, go to High-Fidelity Export: Downloading Dubbed MP4s and Clean Audio Tracks
To go back to the full editor overview, go to AI Dubbing Overview: How AI Voiceover Technology Works
To see all keyboard shortcuts, go to Workflow Hacks: Essential Shortcuts for AI Dubbing Projects

Continue here: The Dubbing Work Area: Fine-tuning AI Voice Tones and Emotions

Contents