Speech Recognition

Automatically transcribe subtitle text from video audio

What Is Speech Recognition

The speech recognition feature automatically analyzes the audio track of a video, converts spoken content into text, and generates a subtitle file with timecodes. It is ideal for videos that do not have existing subtitles, such as self-recorded vlogs, meeting recordings, course videos, etc.

The output is a source-language SRT subtitle file, which can be further edited and adjusted in the subtitle editor.

How to Use

Import videos into the media libraryDrag video files into the GeekLink media library, or click the "Add Videos" button to select files.
Choose "No subtitles, audio only"In the settings panel, pick the "No subtitles, audio only" option — GeekLink recognizes the subtitles from the video's audio (tick "Also translate to another language" if you also want a translation).
Choose the recognition languageIn the settings panel, select the source language of the video, such as Chinese, English, Japanese, etc.
Choose the recognition modelSelect an appropriate model based on your accuracy needs and device performance. We recommend starting with the default "Recommended" model.
Click "Run Speech Recognition"After confirming your settings, click the button to start recognition. You can select multiple videos for batch processing.
Review and edit resultsOnce recognition is complete, click "Open Subtitle Editor" to review the transcription results and make corrections line by line.

Recognition Model Selection

GeekLink offers multiple recognition models with different trade-offs between accuracy and speed. The model file is automatically downloaded the first time you use a particular model.

Model	File Size	Memory Usage	Accuracy	Speed	Best For
Fastest	75 MB	~200 MB	Low	Fastest	Quick preview, testing
Fast	142 MB	~300 MB	Fair	Fast	Everyday use, less accuracy-sensitive
Recommended	466 MB	~600 MB	High	Medium	Default choice, balanced accuracy and speed
High Accuracy	1.5 GB	~2 GB	High	Slower	Professional use, noisy environments
Highest Accuracy + Fast	1.6 GB	~2.5 GB	Highest	Relatively fast	Top accuracy while maintaining speed
Highest Accuracy	2.9 GB	~4 GB	Highest	Slowest	Ultimate accuracy, speed is not a concern

Tip Larger models provide higher accuracy but are slower and use more memory. If your Mac has less than 8 GB of memory, we recommend using the "Recommended" model or a smaller one.

Advanced Settings

Click "More Settings" in the app to expand advanced options:

AI Punctuation Correction PRO

Corrects punctuation only, without changing the text itself. Most effective for Chinese -- Chinese speech recognition often misses punctuation. When enabled, commas, periods, and other punctuation marks are automatically added, significantly improving subtitle readability.

Variety Show Mode

Optimized for variety shows, music, old films, and other content with heavy background audio, to improve recognition accuracy in those conditions. Not recommended for long, continuous dialogue. It is also included automatically when High-Precision Timeline is on, so you don't need to toggle both.

High-Precision Timeline

Off by default. When enabled (for non-Chinese audio), GeekLink aligns each word to the audio for accurate timestamps and produces per-word confidence, so it can flag the lines it was unsure about. The first run downloads an alignment component. Turn it on when timing accuracy or low-confidence review matters; otherwise standard mode is faster. The low-confidence marks it produces are what the SE Review Pack exports for review in Subtitle Edit.

AI Smart Segmentation PRO

Uses an LLM to split the transcript into natural subtitle lines. It helps for long, continuous speech such as talks and narration. Not recommended for short, back-and-forth dialogue — short exchanges are already well segmented, and re-segmenting them can merge separate speakers' lines or over-split a single line. For dialogue-heavy content, leave it off and trust the recognizer's own segments.

Whisper Prompt & Auto-Correct Rules (Proper Nouns)

To get names, places, and brands right, use two complementary tools: the Whisper prompt gives the recognizer context up front, and Auto-Correct Rules deterministically replace known mishearings after recognition. They work best together and apply to both standard and high-precision modes. For a whole series, collect the names episode one gets wrong, add them once, and the rest of the season comes out consistent.

Recognize and Translate in One Pass

You don't need a separate step to translate. On the Speech Recognition panel, tick "Also translate to another language", then pick the target language and a translation engine — GeekLink transcribes the audio and translates it in one run, giving you both the original and the translated subtitles. See the Translation page for the engine choices.

When to split it into two steps instead: if accuracy matters, recognize first, correct the source subtitles in the editor, then translate — clean input produces a better translation. Combining both is faster; doing them separately gives you a checkpoint to fix mistakes before they carry into the translation.

FAQ

Why is the first use of a model so slow?

The first time you use a new model, the model file is automatically downloaded (see the size table above). Download speed depends on your network. Once the download is complete, subsequent uses of that model will start immediately without re-downloading.

What if the recognition results contain errors?

Speech recognition is never 100% accurate, especially with heavy background noise, fast speech, or strong accents. We recommend opening the subtitle editor after recognition to review and correct results line by line. If certain words are frequently misrecognized, you can use "Auto-Correct Rules" PRO to batch-fix common errors.

Why is there no punctuation in the output?

The speech recognition model itself may not output punctuation, especially for Chinese. Enable "AI Punctuation Correction" PRO to automatically add punctuation for more readable subtitles.