Not all languages are equal in AI speech recognition. This page compares transcription accuracy across 22 supported languages in GeekLink, so you can set the right model size before processing your video and avoid surprises.
WER (Word Error Rate) measures how many words are incorrectly transcribed — the lower, the better. Chinese uses CER (Character Error Rate) instead, since Chinese has no word boundaries. A WER of 5% means roughly 1 word error per 20 words on clean audio.
Numbers below are reference values from benchmark datasets. Real-world accuracy varies depending on audio quality, background noise, accent, and speaking pace.
Default model = Recommended (app default). Larger models take longer but recognize more accurately — especially for Japanese and Korean.
| Language | Recommended Model | Best WER | Fast WER | Rating |
|---|---|---|---|---|
| 🇨🇳 Simplified Chinese | Dedicated engine | ~3–5% CER | ~3–5% CER | ⭐⭐⭐⭐⭐ |
| 🇹🇼 Traditional Chinese | Dedicated engine | ~3–5% CER | ~3–5% CER | ⭐⭐⭐⭐⭐ |
| 🇪🇸 Spanish | High Accuracy / Highest Accuracy | ~3–4% | ~14–18% | ⭐⭐⭐⭐⭐ |
| 🇬🇧 English | High Accuracy / Highest Accuracy | ~4–5% | ~12–16% | ⭐⭐⭐⭐⭐ |
| 🇫🇷 French | High Accuracy / Highest Accuracy | ~6–7% | ~18–24% | ⭐⭐⭐⭐ |
| 🇩🇪 German | High Accuracy / Highest Accuracy | ~5–7% | ~17–22% | ⭐⭐⭐⭐ |
| 🇮🇹 Italian | High Accuracy / Highest Accuracy | ~5–7% | ~17–22% | ⭐⭐⭐⭐ |
| 🇵🇹 Portuguese | High Accuracy / Highest Accuracy | ~5–6% | ~16–21% | ⭐⭐⭐⭐ |
| 🇷🇺 Russian | Highest Accuracy | ~8–12% | ~22–28% | ⭐⭐⭐⭐ |
| 🇳🇱 Dutch | High Accuracy / Highest Accuracy | ~6–9% | ~18–24% | ⭐⭐⭐⭐ |
| 🇹🇷 Turkish | High Accuracy / Highest Accuracy | ~7–10% | ~20–26% | ⭐⭐⭐⭐ |
| 🇮🇩 Indonesian | High Accuracy / Highest Accuracy | ~7–10% | ~20–26% | ⭐⭐⭐⭐ |
| 🇵🇱 Polish | High Accuracy / Highest Accuracy | ~7–10% | ~20–26% | ⭐⭐⭐⭐ |
| 🇸🇪 Swedish | High Accuracy / Highest Accuracy | ~7–10% | ~18–24% | ⭐⭐⭐⭐ |
| 🇪🇸 Catalan | High Accuracy / Highest Accuracy | ~5–8% | ~14–18% | ⭐⭐⭐⭐ |
| 🇨🇿 Czech | High Accuracy / Highest Accuracy | ~7–10% | ~20–26% | ⭐⭐⭐⭐ |
| 🇳🇴 Norwegian | High Accuracy / Highest Accuracy | ~7–10% | ~18–24% | ⭐⭐⭐⭐ |
| 🇩🇰 Danish | High Accuracy / Highest Accuracy | ~8–12% | ~22–28% | ⭐⭐⭐⭐ |
| 🇫🇮 Finnish | High Accuracy / Highest Accuracy | ~8–12% | ~22–28% | ⭐⭐⭐⭐ |
| 🇭🇺 Hungarian | High Accuracy / Highest Accuracy | ~8–12% | ~22–28% | ⭐⭐⭐⭐ |
| 🇬🇷 Greek | High Accuracy / Highest Accuracy | ~8–12% | ~22–28% | ⭐⭐⭐⭐ |
| 🇷🇴 Romanian | High Accuracy / Highest Accuracy | ~8–12% | ~22–28% | ⭐⭐⭐⭐ |
| 🇲🇾 Malay | High Accuracy / Highest Accuracy | ~8–12% | ~22–28% | ⭐⭐⭐⭐ |
| 🇸🇦 Arabic | Highest Accuracy | ~10–16% | ~26–34% | ⭐⭐⭐ |
| 🇹🇭 Thai | Dedicated engine | Varies widely | — | ⭐⭐ |
| 🇯🇵 Japanese | Highest Accuracy required | ~10–14% | ~28–35% | ⭐⭐⭐ |
| 🇰🇷 Korean | Highest Accuracy required | ~10–13% | ~26–32% | ⭐⭐⭐ |
| 🇸🇮 Slovenian | Highest Accuracy | ~10–15% | ~28–35% | ⭐⭐⭐ |
| 🇮🇳 Hindi | Highest Accuracy | ~12–18% | ~30–40% | ⭐⭐⭐ |
| 🇺🇦 Ukrainian | Highest Accuracy | ~12–18% | ~28–36% | ⭐⭐⭐ |
| 🇻🇳 Vietnamese | Highest Accuracy | ~14–20% | ~32–40% | ⭐⭐⭐ |
| 🇭🇷 Croatian | Highest Accuracy | ~10–15% | ~26–34% | ⭐⭐⭐ |
| 🇸🇰 Slovak | Highest Accuracy | ~10–15% | ~26–34% | ⭐⭐⭐ |
| 🇧🇬 Bulgarian | Highest Accuracy | ~10–15% | ~26–34% | ⭐⭐⭐ |
| 🇷🇸 Serbian | Highest Accuracy | ~10–15% | ~26–34% | ⭐⭐⭐ |
| 🇮🇱 Hebrew | Highest Accuracy | ~10–15% | ~26–34% | ⭐⭐⭐ |
| 🇮🇷 Persian | Highest Accuracy | ~10–15% | ~26–34% | ⭐⭐⭐ |
| 🇵🇭 Filipino | Highest Accuracy | ~12–18% | ~28–36% | ⭐⭐⭐ |
| 🇱🇹 Lithuanian | Highest Accuracy | ~12–18% | ~28–36% | ⭐⭐⭐ |
| 🇱🇻 Latvian | Highest Accuracy | ~12–18% | ~28–36% | ⭐⭐⭐ |
| 🇪🇪 Estonian | Highest Accuracy | ~12–18% | ~28–36% | ⭐⭐⭐ |
| 🇦🇿 Azerbaijani | Highest Accuracy | ~12–18% | ~28–36% | ⭐⭐⭐ |
| 🇧🇩 Bengali | Highest Accuracy | ~15–20% | ~32–40% | ⭐⭐⭐ |
| 🇵🇰 Urdu | Highest Accuracy | ~15–20% | ~32–40% | ⭐⭐⭐ |
| 🇮🇳 Tamil | Highest Accuracy | ~15–20% | ~32–40% | ⭐⭐⭐ |
| 🇳🇵 Nepali | Highest Accuracy | ~15–22% | ~34–42% | ⭐⭐⭐ |
| 🇰🇪 Swahili | Highest Accuracy | ~15–22% | ~34–42% | ⭐⭐⭐ |
| 🇬🇪 Georgian | Highest Accuracy | ~15–22% | ~34–42% | ⭐⭐⭐ |
| 🇮🇸 Icelandic | Highest Accuracy | ~15–22% | ~34–42% | ⭐⭐⭐ |
Source: Public speech recognition benchmarks (Fleurs dataset) and specialized model evaluations. Actual results may vary.
GeekLink lets you choose the model size. Larger models take more time and disk space but produce significantly better results — especially for Japanese and Korean.
| Model | Download Size | Speed | Quality | Best For |
|---|---|---|---|---|
| Fastest | 75 MB | Fastest (~10x) | Poor | Quick preview only |
| Fast | 142 MB | Fast (~7x) | Fair | Chinese / Spanish quick pass |
| Balanced | 466 MB | Medium (~4x) | Good | European languages daily use |
| High Accuracy | 1.5 GB | Slow (~2x) | Very good | Spanish / English / French recommended |
| Recommended App default | 1.6 GB | Fast (~6x) | Very good | Best speed-accuracy balance for most languages |
| Highest Accuracy | 2.9 GB | Slowest (1x) | Best | Japanese / Korean required; others max accuracy |
Word Error Rate (WER) is the percentage of words that are incorrectly transcribed. A WER of 5% means about 1 in 20 words is wrong. Chinese uses CER (Character Error Rate) since Chinese text doesn't use spaces between words.
Japanese and Korean have complex writing systems (kanji, hanja) and rely heavily on context for correct character selection. Smaller models lack the capacity to capture this context well, leading to very high error rates. Always use the large model for these languages.
GeekLink uses purpose-built recognition engines for Chinese and Thai that are optimized specifically for those languages. These deliver consistently high accuracy without you needing to select a model size.
No. Chinese uses a dedicated engine that is already highly optimized for Mandarin. Switching to a different model size doesn't apply to Chinese recognition.