Speech Recognition Accuracy by Language: WER Reference Table

Q: Why is the base model so much worse for Japanese and Korean?

Japanese and Korean have complex writing systems and rely heavily on context for correct character selection. Smaller models lack the capacity to capture this context well. Always use the large model for these languages.

Not all languages are equal in AI speech recognition. This page compares transcription accuracy across 22 supported languages in GeekLink, so you can set the right model size before processing your video and avoid surprises.

What Is WER?

WER (Word Error Rate) measures how many words are incorrectly transcribed — the lower, the better. Chinese uses CER (Character Error Rate) instead, since Chinese has no word boundaries. A WER of 5% means roughly 1 word error per 20 words on clean audio.

Numbers below are reference values from benchmark datasets. Real-world accuracy varies depending on audio quality, background noise, accent, and speaking pace.

Language Accuracy Quick Reference

Default model = Recommended (app default). Larger models take longer but recognize more accurately — especially for Japanese and Korean.

Language	Recommended Model	Best WER	Fast WER	Rating
Simplified Chinese	Dedicated engine	~3–5% CER	~3–5% CER	★★★★★
Traditional Chinese	Dedicated engine	~3–5% CER	~3–5% CER	★★★★★
Spanish	High Accuracy / Highest Accuracy	~3–4%	~14–18%	★★★★★
English	High Accuracy / Highest Accuracy	~4–5%	~12–16%	★★★★★
French	High Accuracy / Highest Accuracy	~6–7%	~18–24%	★★★★
German	High Accuracy / Highest Accuracy	~5–7%	~17–22%	★★★★
Italian	High Accuracy / Highest Accuracy	~5–7%	~17–22%	★★★★
Portuguese	High Accuracy / Highest Accuracy	~5–6%	~16–21%	★★★★
Russian	Highest Accuracy	~8–12%	~22–28%	★★★★
Dutch	High Accuracy / Highest Accuracy	~6–9%	~18–24%	★★★★
Turkish	High Accuracy / Highest Accuracy	~7–10%	~20–26%	★★★★
Indonesian	High Accuracy / Highest Accuracy	~7–10%	~20–26%	★★★★
Polish	High Accuracy / Highest Accuracy	~7–10%	~20–26%	★★★★
Swedish	High Accuracy / Highest Accuracy	~7–10%	~18–24%	★★★★
Catalan	High Accuracy / Highest Accuracy	~5–8%	~14–18%	★★★★
Czech	High Accuracy / Highest Accuracy	~7–10%	~20–26%	★★★★
Norwegian	High Accuracy / Highest Accuracy	~7–10%	~18–24%	★★★★
Danish	High Accuracy / Highest Accuracy	~8–12%	~22–28%	★★★★
Finnish	High Accuracy / Highest Accuracy	~8–12%	~22–28%	★★★★
Hungarian	High Accuracy / Highest Accuracy	~8–12%	~22–28%	★★★★
Greek	High Accuracy / Highest Accuracy	~8–12%	~22–28%	★★★★
Romanian	High Accuracy / Highest Accuracy	~8–12%	~22–28%	★★★★
Malay	High Accuracy / Highest Accuracy	~8–12%	~22–28%	★★★★
Arabic	Highest Accuracy	~10–16%	~26–34%	★★★
Thai	Dedicated engine	Varies widely	—	★★
Japanese	Highest Accuracy required	~10–14%	~28–35%	★★★
Korean	Highest Accuracy required	~10–13%	~26–32%	★★★
Slovenian	Highest Accuracy	~10–15%	~28–35%	★★★
Hindi	Highest Accuracy	~12–18%	~30–40%	★★★
Ukrainian	Highest Accuracy	~12–18%	~28–36%	★★★
Vietnamese	Highest Accuracy	~14–20%	~32–40%	★★★
Croatian	Highest Accuracy	~10–15%	~26–34%	★★★
Slovak	Highest Accuracy	~10–15%	~26–34%	★★★
Bulgarian	Highest Accuracy	~10–15%	~26–34%	★★★
Serbian	Highest Accuracy	~10–15%	~26–34%	★★★
Hebrew	Highest Accuracy	~10–15%	~26–34%	★★★
Persian	Highest Accuracy	~10–15%	~26–34%	★★★
Filipino	Highest Accuracy	~12–18%	~28–36%	★★★
Lithuanian	Highest Accuracy	~12–18%	~28–36%	★★★
Latvian	Highest Accuracy	~12–18%	~28–36%	★★★
Estonian	Highest Accuracy	~12–18%	~28–36%	★★★
Azerbaijani	Highest Accuracy	~12–18%	~28–36%	★★★
Bengali	Highest Accuracy	~15–20%	~32–40%	★★★
Urdu	Highest Accuracy	~15–20%	~32–40%	★★★
Tamil	Highest Accuracy	~15–20%	~32–40%	★★★
Nepali	Highest Accuracy	~15–22%	~34–42%	★★★
Swahili	Highest Accuracy	~15–22%	~34–42%	★★★
Georgian	Highest Accuracy	~15–22%	~34–42%	★★★
Icelandic	Highest Accuracy	~15–22%	~34–42%	★★★

Source: Public speech recognition benchmarks (Fleurs dataset) and specialized model evaluations. Actual results may vary.

Model Size vs Accuracy vs Speed

GeekLink lets you choose the model size. Larger models take more time and disk space but produce significantly better results — especially for Japanese and Korean.

Model	Download Size	Speed	Quality	Best For
Fastest	75 MB	Fastest (~10x)	Poor	Quick preview only
Fast	142 MB	Fast (~7x)	Fair	Chinese / Spanish quick pass
Balanced	466 MB	Medium (~4x)	Good	European languages daily use
High Accuracy	1.5 GB	Slow (~2x)	Very good	Spanish / English / French recommended
Recommended App default	1.6 GB	Fast (~6x)	Very good	Best speed-accuracy balance for most languages
Highest Accuracy	2.9 GB	Slowest (1x)	Best	Japanese / Korean required; others max accuracy

Model Selection Tips

Chinese & Thai: GeekLink uses a dedicated engine for these languages. Chinese accuracy is consistently high. Thai accuracy varies significantly by content — works well for standard speech, but casual conversation and dialects may produce poor results.
Japanese & Korean: Strongly recommend Highest Accuracy. The Fast model has a very high error rate (26–35%) and will miss many words.
Spanish, English, French, German, Italian, Portuguese, Dutch, Turkish, Indonesian, Polish, Swedish, Catalan, Czech, Norwegian, Danish, Finnish, Hungarian, Greek, Romanian, Malay: High Accuracy is the best balance of speed and accuracy. Use Highest Accuracy for professional-quality output.
Russian: Highest Accuracy makes a noticeable difference; High Accuracy is also acceptable.
Arabic, Hindi, Ukrainian, Vietnamese, Slovenian, Croatian, Slovak, Bulgarian, Serbian, Hebrew, Persian, Filipino, Lithuanian, Latvian, Estonian, Azerbaijani, Bengali, Urdu, Tamil, Nepali, Swahili, Georgian, Icelandic: Highest Accuracy recommended. Standard pronunciation with clear audio produces the best results.

Known Limitations

Background music: GeekLink applies silence pre-processing, but heavy vocal/music overlap degrades accuracy in all languages.
Dialects & accents: Standard accent works best. Dialects (e.g. Cantonese, Sichuan Mandarin) have higher error rates even with dedicated engines.
Japanese kanji: The written form may differ from standard conventions — review key terms before translation.
Sentence segmentation: AI sometimes produces long sentences. Use GeekLink's subtitle editor to split them as needed.

FAQ

What does WER mean?

Word Error Rate (WER) is the percentage of words that are incorrectly transcribed. A WER of 5% means about 1 in 20 words is wrong. Chinese uses CER (Character Error Rate) since Chinese text doesn't use spaces between words.

Why is the base model so much worse for Japanese and Korean?

Japanese and Korean have complex writing systems (kanji, hanja) and rely heavily on context for correct character selection. Smaller models lack the capacity to capture this context well, leading to very high error rates. Always use the large model for these languages.

Why don't Chinese and Thai have a model size option?

GeekLink uses purpose-built recognition engines for Chinese and Thai that are optimized specifically for those languages. These deliver consistently high accuracy without you needing to select a model size.

Will accuracy improve if I use the large model for Chinese?

No. Chinese uses a dedicated engine that is already highly optimized for Mandarin. Switching to a different model size doesn't apply to Chinese recognition.

Get Started with GeekLink

Download for free and experience AI-powered subtitle tools.

Free Download