Speech Recognition Accuracy by Language: WER Reference Table

Not all languages are equal in AI speech recognition. This page compares transcription accuracy across 12 supported languages in GeekLink, so you can set the right model size before processing your video and avoid surprises.

What Is WER?

WER (Word Error Rate) measures how many words are incorrectly transcribed — the lower, the better. Chinese uses CER (Character Error Rate) instead, since Chinese has no word boundaries. A WER of 5% means roughly 1 word error per 20 words on clean audio.

Numbers below are reference values from benchmark datasets. Real-world accuracy varies depending on audio quality, background noise, accent, and speaking pace.

Language Accuracy Quick Reference

Default model = base (app default). Larger models take longer but recognize more accurately — especially for Japanese and Korean.

Language Recommended Model large WER base WER Rating
🇨🇳 Simplified Chinese Dedicated engine ~3–5% CER ~3–5% CER ⭐⭐⭐⭐⭐
🇹🇼 Traditional Chinese Dedicated engine ~3–5% CER ~3–5% CER ⭐⭐⭐⭐⭐
🇪🇸 Spanish medium / large ~3–4% ~14–18% ⭐⭐⭐⭐⭐
🇬🇧 English medium / large ~4–5% ~12–16% ⭐⭐⭐⭐⭐
🇫🇷 French medium / large ~6–7% ~18–24% ⭐⭐⭐⭐
🇩🇪 German medium / large ~5–7% ~17–22% ⭐⭐⭐⭐
🇮🇹 Italian medium / large ~5–7% ~17–22% ⭐⭐⭐⭐
🇵🇹 Portuguese medium / large ~5–6% ~16–21% ⭐⭐⭐⭐
🇷🇺 Russian large ~8–12% ~22–28% ⭐⭐⭐⭐
🇹🇭 Thai Dedicated engine ~8–12% ⭐⭐⭐⭐
🇯🇵 Japanese large required ~10–14% ~28–35% ⭐⭐⭐
🇰🇷 Korean large required ~10–13% ~26–32% ⭐⭐⭐

Source: OpenAI Whisper paper (Fleurs benchmark, large-v2) and public benchmarks for specialized models. Actual results may vary.

Model Size vs Accuracy vs Speed

GeekLink lets you choose the model size. Larger models take more time and disk space but produce significantly better results — especially for Japanese and Korean.

Model Parameters Speed (CPU) Quality Best For
tiny 39M Fastest (~10x) Poor Quick preview only
base App default 74M Fast (~7x) Fair Chinese / Spanish quick pass
small 244M Medium (~4x) Good European languages daily use
medium 769M Slow (~2x) Very good Spanish / English / French recommended
large 1550M Slowest (1x) Best Japanese / Korean required; others max accuracy

Model Selection Tips

Known Limitations

FAQ

What does WER mean?

Word Error Rate (WER) is the percentage of words that are incorrectly transcribed. A WER of 5% means about 1 in 20 words is wrong. Chinese uses CER (Character Error Rate) since Chinese text doesn't use spaces between words.

Why is the base model so much worse for Japanese and Korean?

Japanese and Korean have complex writing systems (kanji, hanja) and rely heavily on context for correct character selection. Smaller models lack the capacity to capture this context well, leading to very high error rates. Always use the large model for these languages.

Why don't Chinese and Thai have a model size option?

GeekLink uses purpose-built recognition engines for Chinese and Thai that are optimized specifically for those languages. These deliver consistently high accuracy without you needing to select a model size.

Will accuracy improve if I use the large model for Chinese?

No. Chinese uses a dedicated engine that is already highly optimized for Mandarin. Switching to a different model size doesn't apply to Chinese recognition.

Related Articles

Get Started with GeekLink

Download for free and experience AI-powered subtitle tools.

Free Download