Not all languages are equal in AI speech recognition. This page compares transcription accuracy across 12 supported languages in GeekLink, so you can set the right model size before processing your video and avoid surprises.
WER (Word Error Rate) measures how many words are incorrectly transcribed — the lower, the better. Chinese uses CER (Character Error Rate) instead, since Chinese has no word boundaries. A WER of 5% means roughly 1 word error per 20 words on clean audio.
Numbers below are reference values from benchmark datasets. Real-world accuracy varies depending on audio quality, background noise, accent, and speaking pace.
Default model = base (app default). Larger models take longer but recognize more accurately — especially for Japanese and Korean.
| Language | Recommended Model | large WER | base WER | Rating |
|---|---|---|---|---|
| 🇨🇳 Simplified Chinese | Dedicated engine | ~3–5% CER | ~3–5% CER | ⭐⭐⭐⭐⭐ |
| 🇹🇼 Traditional Chinese | Dedicated engine | ~3–5% CER | ~3–5% CER | ⭐⭐⭐⭐⭐ |
| 🇪🇸 Spanish | medium / large | ~3–4% | ~14–18% | ⭐⭐⭐⭐⭐ |
| 🇬🇧 English | medium / large | ~4–5% | ~12–16% | ⭐⭐⭐⭐⭐ |
| 🇫🇷 French | medium / large | ~6–7% | ~18–24% | ⭐⭐⭐⭐ |
| 🇩🇪 German | medium / large | ~5–7% | ~17–22% | ⭐⭐⭐⭐ |
| 🇮🇹 Italian | medium / large | ~5–7% | ~17–22% | ⭐⭐⭐⭐ |
| 🇵🇹 Portuguese | medium / large | ~5–6% | ~16–21% | ⭐⭐⭐⭐ |
| 🇷🇺 Russian | large | ~8–12% | ~22–28% | ⭐⭐⭐⭐ |
| 🇹🇭 Thai | Dedicated engine | ~8–12% | — | ⭐⭐⭐⭐ |
| 🇯🇵 Japanese | large required | ~10–14% | ~28–35% | ⭐⭐⭐ |
| 🇰🇷 Korean | large required | ~10–13% | ~26–32% | ⭐⭐⭐ |
Source: OpenAI Whisper paper (Fleurs benchmark, large-v2) and public benchmarks for specialized models. Actual results may vary.
GeekLink lets you choose the model size. Larger models take more time and disk space but produce significantly better results — especially for Japanese and Korean.
| Model | Parameters | Speed (CPU) | Quality | Best For |
|---|---|---|---|---|
tiny |
39M | Fastest (~10x) | Poor | Quick preview only |
base App default |
74M | Fast (~7x) | Fair | Chinese / Spanish quick pass |
small |
244M | Medium (~4x) | Good | European languages daily use |
medium |
769M | Slow (~2x) | Very good | Spanish / English / French recommended |
large |
1550M | Slowest (1x) | Best | Japanese / Korean required; others max accuracy |
large. The base model has a very high error rate (26–35%) and will miss many words.medium is the best balance of speed and accuracy. Use large for professional-quality output.large makes a noticeable difference; medium is also acceptable.Word Error Rate (WER) is the percentage of words that are incorrectly transcribed. A WER of 5% means about 1 in 20 words is wrong. Chinese uses CER (Character Error Rate) since Chinese text doesn't use spaces between words.
Japanese and Korean have complex writing systems (kanji, hanja) and rely heavily on context for correct character selection. Smaller models lack the capacity to capture this context well, leading to very high error rates. Always use the large model for these languages.
GeekLink uses purpose-built recognition engines for Chinese and Thai that are optimized specifically for those languages. These deliver consistently high accuracy without you needing to select a model size.
No. Chinese uses a dedicated engine that is already highly optimized for Mandarin. Switching to a different model size doesn't apply to Chinese recognition.