Speech Recognition
Automatically transcribe subtitle text from video audio
What Is Speech Recognition
The speech recognition feature automatically analyzes the audio track of a video, converts spoken content into text, and generates a subtitle file with timecodes. It is ideal for videos that do not have existing subtitles, such as self-recorded vlogs, meeting recordings, course videos, etc.
The output is a source-language SRT subtitle file, which can be further edited and adjusted in the subtitle editor.
How to Use
- Import videos into the media libraryDrag video files into the GeekLink media library, or click the "Add Videos" button to select files.
- Switch to the "Speech Recognition" tabSelect the "Speech Recognition" tab at the top of the main interface.
- Choose the recognition languageIn the settings panel, select the source language of the video, such as Chinese, English, Japanese, etc.
- Choose the recognition modelSelect an appropriate model based on your accuracy needs and device performance. We recommend starting with the default "Recommended" model.
- Click "Run Speech Recognition"After confirming your settings, click the button to start recognition. You can select multiple videos for batch processing.
- Review and edit resultsOnce recognition is complete, click "Open Subtitle Editor" to review the transcription results and make corrections line by line.
Recognition Model Selection
GeekLink offers multiple recognition models with different trade-offs between accuracy and speed. The model file is automatically downloaded the first time you use a particular model.
| Model | File Size | Memory Usage | Accuracy | Speed | Best For |
|---|---|---|---|---|---|
| Fastest | 75 MB | ~200 MB | Low | Fastest | Quick preview, testing |
| Fast | 142 MB | ~300 MB | Fair | Fast | Everyday use, less accuracy-sensitive |
| Recommended | 466 MB | ~600 MB | High | Medium | Default choice, balanced accuracy and speed |
| High Accuracy | 1.5 GB | ~2 GB | High | Slower | Professional use, noisy environments |
| Highest Accuracy + Fast | 1.6 GB | ~2.5 GB | Highest | Relatively fast | Top accuracy while maintaining speed |
| Highest Accuracy | 2.9 GB | ~4 GB | Highest | Slowest | Ultimate accuracy, speed is not a concern |
Advanced Settings
Click "More Settings" in the app to expand advanced options:
Max Characters per Subtitle Line (Source Language)
Controls the maximum text length of a single subtitle line, ranging from 10 to 200. Leave blank for no limit. Useful for managing reading density, especially for Chinese subtitles -- Chinese has no natural word spacing, so long lines without breaks can hurt the viewing experience.
AI Punctuation Correction PRO
Corrects punctuation only, without changing the text itself. Most effective for Chinese -- Chinese speech recognition often misses punctuation. When enabled, commas, periods, and other punctuation marks are automatically added, significantly improving subtitle readability.
Variety Show Mode
Optimized for variety shows, reality TV, and other scenarios with heavy background noise and rapid multi-speaker dialogue. When enabled, the recognition strategy is adjusted to better handle noisy environments and fast speech switching.
FAQ
Why is the first use of a model so slow?
The first time you use a new model, the model file is automatically downloaded (see the size table above). Download speed depends on your network. Once the download is complete, subsequent uses of that model will start immediately without re-downloading.
What if the recognition results contain errors?
Speech recognition is never 100% accurate, especially with heavy background noise, fast speech, or strong accents. We recommend opening the subtitle editor after recognition to review and correct results line by line. If certain words are frequently misrecognized, you can use "Auto-Correct Rules" PRO to batch-fix common errors.
Why is there no punctuation in the output?
The speech recognition model itself may not output punctuation, especially for Chinese. Enable "AI Punctuation Correction" PRO to automatically add punctuation for more readable subtitles.