HomeBlog › How to Add Captions & Subtitles to a Video (Automatically)
Captions

How to Add Captions & Subtitles to a Video (Automatically)

85% of social video is watched on mute. Captions aren’t optional anymore — they’re how most people read your video. Here’s how to add them automatically on your phone, on-device, in a couple of taps.

Most people watch your video without sound. Captions are how they read it — and on-screen text also boosts watch-time and accessibility. The good news: you don’t type a word. ExpoCut transcribes the speech on-device and drops in timed captions you can restyle in seconds.

Why captions matter (more than you think)

  • The sound is off. Feeds autoplay muted; captions are the difference between a scroll and a watch.
  • Retention. On-screen words give the eye something to track, which keeps viewers in the video longer.
  • Accessibility & reach. Captions open your video to deaf and hard-of-hearing viewers and to anyone in a quiet room.
  • Repurposing. A transcript is a free script you can reuse for a caption, a blog post, or a carousel.

Step 1 — Add the clip with speech

  1. Drop your video or voiceover on the timeline

    Any clip with talking works — a talking-head, a vlog, or a voiceover track. Clear audio gives the best transcription.

  2. Select the clip

    Tap it on the timeline so its tools appear.

  3. Tap Transcribe

    ExpoCut listens to the audio and builds a timed caption layer — every line appears at the moment it’s spoken. It runs on the phone, so nothing uploads.

A voiceover clip selected on the timeline with a Transcribe button in the audio toolbar Your speech clip Transcribe
Select the clip with speech and tap Transcribe. The first run downloads a small on-device speech model (~32 MB), then everything transcribes locally — nothing uploads.

It’s all on-device. Transcription happens locally — your audio never leaves the phone. That means it works offline, and private recordings stay private.

Step 2 — Fix the wording

Auto-transcription is fast but not perfect with names, slang, or noisy audio. Tap any caption to edit the text the same way you’d edit a title. Fix a misheard word, split a long line, or delete filler — the timing stays attached to the audio.

The generated transcript shown as a timed segment reading Most people watch video on mute with a 0.2 to 4.4 second timestamp Auto-transcribed line — tap to fix a word
ExpoCut turns the speech into timed segments — here, “Most people watch video on mute…” at 0.2s–4.4s. Tap a line to correct a word; the timing stays locked to the audio.

Keep lines short. One short phrase on screen at a time reads far better than a paragraph. If a caption feels long, split it so each line lands with its beat of speech.

Step 3 — Style them for readability

Captions only work if they’re easy to read over moving footage. ExpoCut ships ready-made caption styles — word-highlight Karaoke, Marker, Box, Gradient and more — so one tap gets you a social-ready look you can then fine-tune.

Caption style picker showing the transcribed words previewed with styles like Classic Karaoke, Neon, Yellow Marker, Blue Box and Purple Gradient Your words, previewed live Tap a style — Karaoke, Marker, Box, Gradient…
Pick a style and the preview updates with your real transcribed words. Adjust font, size and alignment from the same panel.

The caption layer uses the same text controls as any title, so you can:

Control Recommendation
Font & size A clean bold sans, ≥ 18pt. Big enough to read on a small phone.
Color & shade White text on a subtle dark shade/box so it survives bright frames.
Position Lower third, inside the safe area — clear of the platform’s UI buttons.
Highlight Punch up keywords with a color run for that “social caption” pop.

Step 4 — Export with the captions baked in

Captions render as a real layer, so when you export they’re burned into the video — they show on every platform, even ones that ignore separate subtitle files. Preview equals export, so what you see on the canvas is exactly what ships.

A faster workflow for social

  1. Cut the talking parts tight first.
  2. Transcribe — get the whole caption track in one tap.
  3. Restyle once (font, shade, position); the style applies to the run.
  4. Punch keywords with a color highlight.
  5. Export 1080×1920 for Reels / TikTok / Shorts.

Where to go next

Captions are the highest-leverage thing you can add to a talking video — and with on-device auto-transcription, they cost you two taps instead of ten minutes of typing.


Frequently asked questions

How do I add captions to a video automatically?

Add your clip, select it, and tap Transcribe. ExpoCut listens to the speech and generates a timed caption layer on-device — no typing. Then you can fix any words, restyle the text, and reposition it before exporting. The whole transcription runs on the phone, so your audio never leaves the device.

Are the captions burned into the video?

Yes — captions render as a real text layer, so they're baked into the exported file and show everywhere, even on platforms that don't support separate subtitle tracks. If you'd rather have selectable subtitles, you can still use the on-screen text as your script.

Do auto-captions work offline?

Yes. Transcription is on-device, so it works with no signal and keeps your audio private. Accuracy is best with clear speech and minimal background noise — if a word is wrong, just tap the caption and fix it.

Caption your video in two taps

On-device auto-transcription turns speech into styled, timed captions — no typing, no upload. Add your clip and tap Transcribe.

Get ExpoCut Browse all guides