Hardcoded subtitles (burned-in captions) are text baked permanently into video frames — they cannot be turned off, copied, or edited without OCR extraction. AI-powered OCR can read this text frame-by-frame, reconstruct timing, and output an editable SRT file. This guide covers all major scenarios: Chinese short dramas, Japanese anime, variety shows, and old movies — with step-by-step instructions for extracting, editing, and translating hardcoded subtitles locally on your Mac.
What are hardcoded subtitles, and how are they different from soft subtitles?
Subtitles come in two fundamentally different forms, and the distinction matters for extraction.
Soft subtitles (also called external or toggleable subtitles) are separate text files — SRT, ASS, or VTT — that a video player overlays during playback. You can turn them on or off, switch languages, and edit the text file directly. YouTube subtitle tracks, Netflix language options, and downloaded .srt files are all soft subtitles.
Hardcoded subtitles (also called burned-in, embedded, or open captions) are part of the video image itself. During video editing, the text was rendered directly onto each frame. There is no separate text layer — the subtitle pixels are indistinguishable from the rest of the image to a video player.
The key consequence: you cannot extract hardcoded subtitles by simply opening the video file and looking for a text track. The only way to recover the text is to "read" it from the image using OCR (Optical Character Recognition).
How to tell which type you have:
- If your video player has a subtitle toggle button and the text disappears when you turn it off → soft subtitles
- If the text stays visible regardless of player settings → hardcoded
- If you open the video in VLC → Subtitle menu → and no tracks are listed → hardcoded
- If you run
ffprobe -i video.mp4and see no subtitle stream → hardcoded
Why would you need to extract hardcoded subtitles?
There are four primary reasons people extract burned-in subtitles from video:
1. Translation to another language
This is the most common use case. You have a video with Chinese hardcoded subtitles (common on Douyin, Bilibili, WeChat Channels) and want to translate them to English, Japanese, or another language. You cannot translate what you cannot edit — so extraction comes first.
2. Creating a searchable transcript
Researchers, journalists, and archivists often need text versions of video content for indexing, searching, and citing. Hardcoded subtitles contain the information but are trapped in pixel form.
3. Re-styling or repositioning subtitles
The burned-in subtitles may be poorly positioned (covering important visuals), too small to read on mobile, or styled in a way that clashes with your use case. Extracting the text lets you re-render it with your preferred font, size, color, and position.
4. Accessibility and compliance
Platforms like YouTube require subtitle files (not burned-in text) for their auto-translate feature and accessibility tools. Extracting hardcoded subtitles to SRT format makes the content accessible to screen readers, auto-translation, and hearing-impaired viewers who use customized caption settings.
How does OCR subtitle extraction actually work?
OCR subtitle extraction is a four-stage pipeline: frame sampling, text detection, character recognition, and deduplication with timestamp assignment. Understanding these stages helps you troubleshoot accuracy issues.
Stage 1: Frame sampling
A video at 30fps contains 1,800 frames per minute. Most subtitles stay on screen for 2-5 seconds, meaning only a fraction of frames contain new text. Smart OCR tools sample frames at intervals (e.g., every 0.5 seconds) and detect when the subtitle text changes, rather than processing every single frame.
This is why processing speed varies — a 10-minute video with 60 subtitle lines requires recognizing ~120 frames (entry + exit detection), not 18,000.
Stage 2: Text region detection
The OCR engine identifies where text appears in each frame. Subtitles are typically in the bottom 20-30% of the screen, but variety shows and anime may place text anywhere — top, middle, or in speech bubbles. Advanced detection models locate text regardless of position.
Stage 3: Character recognition
Once the text region is isolated, the OCR model reads individual characters. This is where language matters significantly:
- Latin scripts (English, Spanish, French) — High accuracy, well-understood by all OCR engines
- CJK scripts (Chinese, Japanese, Korean) — Requires specialized models trained on thousands of character variants. Chinese alone has 6,763 commonly-used characters (GB 2312 standard)
- Mixed scripts (Japanese with kanji + hiragana + katakana + occasional English) — The hardest case, requiring multi-script detection within a single line
Stage 4: Deduplication and timing
The same subtitle line appears across many consecutive frames. The OCR system must recognize that frames 150-220 all contain the same text, group them into a single subtitle entry, and assign the correct start and end timestamps. Good deduplication is the difference between a clean 60-line SRT file and a messy 500-line file with duplicates.
How do you extract hardcoded subtitles step by step?
This walkthrough uses GeekLink on macOS. The entire process runs locally — your video never leaves your machine.
Step 1: Import your video
Drag and drop the video file into GeekLink. Supported formats include MP4, MOV, MKV, AVI, WebM, and FLV. There is no file size limit — OCR processes individual frames, not the full video bitstream.
For batch workflows (e.g., extracting subtitles from an entire season of a drama), import multiple files at once. GeekLink processes them sequentially or in parallel depending on your hardware.
Step 2: Select OCR as the extraction method
GeekLink offers two subtitle extraction methods:
- Speech recognition — Transcribes spoken audio to text. Use this when subtitles do not exist or are inaccurate.
- OCR extraction — Reads text from video frames. Use this when subtitles are already burned into the video.
Select OCR. The engine will read the visual text rather than processing the audio track.
Step 3: Configure the subtitle region (recommended)
Define the area of the frame where subtitles appear. For most content, this is the bottom 20-30% of the screen. Setting a region:
- Eliminates false positives from on-screen text, watermarks, and channel logos
- Speeds up processing by reducing the area to scan
- Improves accuracy by giving the model less visual noise to parse
For variety shows with subtitles in non-standard positions, adjust the region accordingly. For anime with text at multiple positions, you may need to use the full frame.
Step 4: Run the extraction
GeekLink processes the video frame by frame:
- Samples frames at adaptive intervals based on detected text changes
- Detects text regions within each sampled frame
- Recognizes characters using CJK-optimized or Latin-script models
- Deduplicates consecutive identical text to produce clean subtitle entries
- Assigns start and end timestamps to each entry
Processing runs entirely on your Mac's CPU/GPU. A 10-minute video typically takes 1-3 minutes depending on subtitle density and hardware.
Step 5: Review in the built-in editor
Open the subtitle editor to review results. Common corrections:
- Character errors — OCR may confuse similar characters: 已/己/巳, 未/末, rn/m, 0/O. These are quick manual fixes.
- Line splitting — Long lines that should be two separate subtitle entries sometimes merge. Split them at natural sentence boundaries.
- Timestamp adjustment — If a subtitle appears 0.2-0.5 seconds early or late, drag the timestamp to align precisely with the spoken audio.
- Decorative text removal — Variety shows may include extracted decorative text that is not part of the main subtitle. Delete these entries.
Step 6: Export
Export the extracted subtitles in your preferred format:
- SRT — Universal compatibility. Works with YouTube, Vimeo, VLC, Premiere, Final Cut, DaVinci Resolve.
- ASS — Advanced styling for CJK content. Supports custom fonts, colors, positions, and effects.
- VTT — Web-native format for HTML5 video players.
How can you maximize OCR subtitle accuracy?
OCR accuracy depends on the visual characteristics of the subtitles — not the type of video. The same tool will get 99% on clean white text at 1080p and 80% on blurry decorative fonts at 480p. Here are the factors you can control.
1. Source video resolution matters most
720p is the minimum for reliable OCR. Below 720p, character edges become ambiguous and accuracy drops sharply — especially for CJK scripts where stroke details distinguish different characters.
If your source is 480p or lower, consider AI upscaling the video before OCR extraction. Even a 2x upscale (480p → 960p) can improve character boundary clarity enough to gain 5-10% accuracy.
2. Contrast between text and background
White text with black outline on any background: excellent. Yellow text on a bright scene without outline: problematic. If the video has scenes where subtitle text blends into a bright background, those specific frames will have lower accuracy.
3. Define the subtitle region
As mentioned in Step 3: restricting the scan area to where subtitles actually appear eliminates false positives from watermarks, logos, and on-screen graphics. This alone can improve precision from 85% to 95% on variety show content.
4. Avoid processing heavily compressed video
Video compression (especially at low bitrates) creates artifacts around text edges — blocky distortions that confuse OCR. If possible, use the highest-quality source available. A 1080p file at 8 Mbps will OCR significantly better than the same content at 2 Mbps.
5. Handle multi-language content correctly
Some videos show two languages simultaneously (e.g., Chinese + English on separate lines). OCR will extract both. If you only need one language, you can:
- Restrict the region to only the line you need (if they are in different vertical positions)
- Delete the unwanted language entries in the editor after extraction
6. Post-processing: common substitution patterns
After OCR, certain character confusions are predictable and can be batch-corrected:
- English:
rn→m,l→I,0→O - Chinese:
已↔己,未↔末,土↔士 - Japanese:
ー(katakana prolonged sound) ↔一(kanji "one")
Can you extract hardcoded subtitles and translate them in one workflow?
Yes — and this is where OCR extraction becomes most powerful. The extract-translate-export workflow turns a video with foreign hardcoded subtitles into a video with your target language subtitles, all without leaving a single application.
The workflow:
- Extract — OCR reads the hardcoded Chinese/Japanese/Korean subtitles and produces an editable SRT
- Translate — AI translation converts the extracted text to your target language (English, Spanish, Portuguese, etc.) with full sentence context
- Export — Output as a subtitle file, or burn the translated text back into the video as new hardcoded subtitles
This is the most common real-world use case: you have a Chinese short drama with burned-in Chinese subtitles, and you want English subtitles — either as an SRT file or burned into the video at a different position.
Privacy advantage of local processing
In this workflow, the video and audio never leave your Mac. Only the extracted subtitle text (plain text, a few KB) is sent to the translation API. This matters for:
- Unreleased or copyrighted content you do not own distribution rights for
- Corporate or educational videos with proprietary information
- Client work where NDAs prohibit uploading content to third-party services
Batch processing multiple episodes
For series content (drama seasons, lecture series, YouTube playlists), batch processing can extract and translate 20-50 episodes overnight without manual intervention. Import all episodes, configure OCR settings once, and let the tool process sequentially while you sleep.
What are the limitations of OCR subtitle extraction?
OCR is not perfect. Understanding its limitations helps you set realistic expectations and know when to use alternative approaches.
Cannot remove the original subtitles
OCR extracts the text — it does not erase the burned-in subtitles from the video image. If you need the original text gone, you would need video inpainting (a separate, computationally expensive process). The practical workaround: position your new translated subtitles above or below the original ones, or use a slightly opaque background bar.
Decorative text and special effects
Text with heavy gradients, glow effects, 3D rotation, or animation may not be recognized accurately. The model is trained on printed text patterns — the further the visual deviates from standard printed characters, the lower the accuracy.
Very low resolution sources
At 360p or below, CJK characters become ambiguous (strokes merge, radicals are indistinguishable). Latin text fares slightly better at low resolution due to simpler character shapes. If accuracy is unacceptable at native resolution, upscale first.
Overlapping text and mixed languages
When two text layers overlap (e.g., a subtitle over a watermark, or two speakers' subtitles at the same position), OCR may produce garbled output for the overlapping portion. Two ways to handle this: define a specific region to isolate the subtitle layer you want, or use language filtering — for example, if a Japanese video has burned-in Simplified Chinese subtitles, you can filter out Japanese characters so only Chinese text is recognized, producing a much cleaner result.
Handwritten or highly stylized fonts
OCR models are trained primarily on printed typefaces. Handwritten text, calligraphic styles, or heavily decorative fonts (common in variety show "reaction text") have significantly lower recognition rates.
Frequently Asked Questions
What are hardcoded subtitles?
Hardcoded subtitles (also called burned-in subtitles or open captions) are text that has been permanently rendered into the video image during editing or encoding. They are part of the pixels — you cannot turn them off, change their language, or edit them without OCR extraction. Common examples: Chinese Douyin/Bilibili videos, fansub anime releases, old DVD rips, and social media clips edited with CapCut or similar tools.
Can OCR extract subtitles from any language?
Modern OCR handles most major scripts: Chinese (Simplified and Traditional), Japanese (kanji + hiragana + katakana), Korean (Hangul), English, Spanish, French, German, Portuguese, Russian, Arabic (RTL), Thai, Vietnamese, and other Latin-script languages. CJK scripts require specialized models due to the large character set (6,763 common Chinese characters alone). GeekLink includes CJK-optimized models that run locally on your Mac.
How accurate is OCR subtitle extraction?
Accuracy depends on video resolution, text contrast, and font style — not the type of video content. Clean white text with outline at 720p+: 95-99%. Styled or decorative text at 720p: 85-93%. Low-resolution 480p sources: 80-90%. CJK characters need higher resolution than Latin text because stroke details matter more. For professional use, always review OCR output in a subtitle editor before publishing.
Is OCR subtitle extraction better than speech recognition?
They solve different problems. Use OCR when subtitles are already burned into the video and you want to extract that exact text. Use speech recognition when there are no subtitles and you want to transcribe spoken audio. If a video has hardcoded subtitles AND clear audio, OCR typically gives more accurate results because it reads what is already written rather than interpreting audio. For videos with poor audio quality but clean subtitles, OCR is clearly superior.
Can I remove hardcoded subtitles from a video?
OCR extracts the text content but does not visually remove the burned-in subtitles from the video frames. Removing them would require video inpainting (filling in the area behind the text), which is a separate and computationally expensive process. The practical approach: extract the text via OCR, translate it, then overlay new subtitles on top of or adjacent to the originals.
How long does OCR subtitle extraction take?
Processing time depends on video length, subtitle density, and your hardware. Typical benchmarks on an Apple Silicon Mac (M1 or later): a 10-minute video with ~60 subtitle lines takes 1-3 minutes. A 45-minute drama episode takes 5-12 minutes. Batch processing runs in the background — you can queue an entire 20-episode season and let it process overnight.
Related Articles
Disclosure: This guide is written by the GeekLink team. GeekLink is a macOS subtitle tool that includes OCR extraction. All accuracy figures are based on our internal testing across 200+ videos in Chinese, Japanese, Korean, and English at various resolutions. Your results may vary depending on source video quality and subtitle styling.