Speech Recognition Accuracy by Language: WER Reference Table

Not all languages are equal in AI speech recognition. This page compares transcription accuracy across 22 supported languages in GeekLink, so you can set the right model size before processing your video and avoid surprises.

What Is WER?

WER (Word Error Rate) measures how many words are incorrectly transcribed — the lower, the better. Chinese uses CER (Character Error Rate) instead, since Chinese has no word boundaries. A WER of 5% means roughly 1 word error per 20 words on clean audio.

Numbers below are reference values from benchmark datasets. Real-world accuracy varies depending on audio quality, background noise, accent, and speaking pace.

Language Accuracy Quick Reference

Default model = Recommended (app default). Larger models take longer but recognize more accurately — especially for Japanese and Korean.

Language Recommended Model Best WER Fast WER Rating
🇨🇳 Simplified Chinese Dedicated engine ~3–5% CER ~3–5% CER ⭐⭐⭐⭐⭐
🇹🇼 Traditional Chinese Dedicated engine ~3–5% CER ~3–5% CER ⭐⭐⭐⭐⭐
🇪🇸 Spanish High Accuracy / Highest Accuracy ~3–4% ~14–18% ⭐⭐⭐⭐⭐
🇬🇧 English High Accuracy / Highest Accuracy ~4–5% ~12–16% ⭐⭐⭐⭐⭐
🇫🇷 French High Accuracy / Highest Accuracy ~6–7% ~18–24% ⭐⭐⭐⭐
🇩🇪 German High Accuracy / Highest Accuracy ~5–7% ~17–22% ⭐⭐⭐⭐
🇮🇹 Italian High Accuracy / Highest Accuracy ~5–7% ~17–22% ⭐⭐⭐⭐
🇵🇹 Portuguese High Accuracy / Highest Accuracy ~5–6% ~16–21% ⭐⭐⭐⭐
🇷🇺 Russian Highest Accuracy ~8–12% ~22–28% ⭐⭐⭐⭐
🇳🇱 Dutch High Accuracy / Highest Accuracy ~6–9% ~18–24% ⭐⭐⭐⭐
🇹🇷 Turkish High Accuracy / Highest Accuracy ~7–10% ~20–26% ⭐⭐⭐⭐
🇮🇩 Indonesian High Accuracy / Highest Accuracy ~7–10% ~20–26% ⭐⭐⭐⭐
🇵🇱 Polish High Accuracy / Highest Accuracy ~7–10% ~20–26% ⭐⭐⭐⭐
🇸🇪 Swedish High Accuracy / Highest Accuracy ~7–10% ~18–24% ⭐⭐⭐⭐
🇪🇸 Catalan High Accuracy / Highest Accuracy ~5–8% ~14–18% ⭐⭐⭐⭐
🇨🇿 Czech High Accuracy / Highest Accuracy ~7–10% ~20–26% ⭐⭐⭐⭐
🇳🇴 Norwegian High Accuracy / Highest Accuracy ~7–10% ~18–24% ⭐⭐⭐⭐
🇩🇰 Danish High Accuracy / Highest Accuracy ~8–12% ~22–28% ⭐⭐⭐⭐
🇫🇮 Finnish High Accuracy / Highest Accuracy ~8–12% ~22–28% ⭐⭐⭐⭐
🇭🇺 Hungarian High Accuracy / Highest Accuracy ~8–12% ~22–28% ⭐⭐⭐⭐
🇬🇷 Greek High Accuracy / Highest Accuracy ~8–12% ~22–28% ⭐⭐⭐⭐
🇷🇴 Romanian High Accuracy / Highest Accuracy ~8–12% ~22–28% ⭐⭐⭐⭐
🇲🇾 Malay High Accuracy / Highest Accuracy ~8–12% ~22–28% ⭐⭐⭐⭐
🇸🇦 Arabic Highest Accuracy ~10–16% ~26–34% ⭐⭐⭐
🇹🇭 Thai Dedicated engine Varies widely ⭐⭐
🇯🇵 Japanese Highest Accuracy required ~10–14% ~28–35% ⭐⭐⭐
🇰🇷 Korean Highest Accuracy required ~10–13% ~26–32% ⭐⭐⭐
🇸🇮 Slovenian Highest Accuracy ~10–15% ~28–35% ⭐⭐⭐
🇮🇳 Hindi Highest Accuracy ~12–18% ~30–40% ⭐⭐⭐
🇺🇦 Ukrainian Highest Accuracy ~12–18% ~28–36% ⭐⭐⭐
🇻🇳 Vietnamese Highest Accuracy ~14–20% ~32–40% ⭐⭐⭐
🇭🇷 Croatian Highest Accuracy ~10–15% ~26–34% ⭐⭐⭐
🇸🇰 Slovak Highest Accuracy ~10–15% ~26–34% ⭐⭐⭐
🇧🇬 Bulgarian Highest Accuracy ~10–15% ~26–34% ⭐⭐⭐
🇷🇸 Serbian Highest Accuracy ~10–15% ~26–34% ⭐⭐⭐
🇮🇱 Hebrew Highest Accuracy ~10–15% ~26–34% ⭐⭐⭐
🇮🇷 Persian Highest Accuracy ~10–15% ~26–34% ⭐⭐⭐
🇵🇭 Filipino Highest Accuracy ~12–18% ~28–36% ⭐⭐⭐
🇱🇹 Lithuanian Highest Accuracy ~12–18% ~28–36% ⭐⭐⭐
🇱🇻 Latvian Highest Accuracy ~12–18% ~28–36% ⭐⭐⭐
🇪🇪 Estonian Highest Accuracy ~12–18% ~28–36% ⭐⭐⭐
🇦🇿 Azerbaijani Highest Accuracy ~12–18% ~28–36% ⭐⭐⭐
🇧🇩 Bengali Highest Accuracy ~15–20% ~32–40% ⭐⭐⭐
🇵🇰 Urdu Highest Accuracy ~15–20% ~32–40% ⭐⭐⭐
🇮🇳 Tamil Highest Accuracy ~15–20% ~32–40% ⭐⭐⭐
🇳🇵 Nepali Highest Accuracy ~15–22% ~34–42% ⭐⭐⭐
🇰🇪 Swahili Highest Accuracy ~15–22% ~34–42% ⭐⭐⭐
🇬🇪 Georgian Highest Accuracy ~15–22% ~34–42% ⭐⭐⭐
🇮🇸 Icelandic Highest Accuracy ~15–22% ~34–42% ⭐⭐⭐

Source: Public speech recognition benchmarks (Fleurs dataset) and specialized model evaluations. Actual results may vary.

Model Size vs Accuracy vs Speed

GeekLink lets you choose the model size. Larger models take more time and disk space but produce significantly better results — especially for Japanese and Korean.

Model Download Size Speed Quality Best For
Fastest 75 MB Fastest (~10x) Poor Quick preview only
Fast 142 MB Fast (~7x) Fair Chinese / Spanish quick pass
Balanced 466 MB Medium (~4x) Good European languages daily use
High Accuracy 1.5 GB Slow (~2x) Very good Spanish / English / French recommended
Recommended App default 1.6 GB Fast (~6x) Very good Best speed-accuracy balance for most languages
Highest Accuracy 2.9 GB Slowest (1x) Best Japanese / Korean required; others max accuracy

Model Selection Tips

Known Limitations

FAQ

What does WER mean?

Word Error Rate (WER) is the percentage of words that are incorrectly transcribed. A WER of 5% means about 1 in 20 words is wrong. Chinese uses CER (Character Error Rate) since Chinese text doesn't use spaces between words.

Why is the base model so much worse for Japanese and Korean?

Japanese and Korean have complex writing systems (kanji, hanja) and rely heavily on context for correct character selection. Smaller models lack the capacity to capture this context well, leading to very high error rates. Always use the large model for these languages.

Why don't Chinese and Thai have a model size option?

GeekLink uses purpose-built recognition engines for Chinese and Thai that are optimized specifically for those languages. These deliver consistently high accuracy without you needing to select a model size.

Will accuracy improve if I use the large model for Chinese?

No. Chinese uses a dedicated engine that is already highly optimized for Mandarin. Switching to a different model size doesn't apply to Chinese recognition.

Related Articles

Get Started with GeekLink

Download for free and experience AI-powered subtitle tools.

Free Download