Skip to main content

Auto Captions

The Auto Captions node uses OpenAI’s Whisper model to transcribe audio from a video and overlay styled captions. Customize fonts, colors, animations, and positioning.

Inputs

Handle IDData TypeLabel
video-inVideoVideo

Outputs

Handle IDData TypeLabel
video-outVideoVideo

Available engines

Engine IDLabelCost
fal-whisperWhisper1 credit

How it works

  1. Receives a video with audio from the input handle
  2. Whisper transcribes the audio into timestamped words
  3. You configure the caption style (font, colors, animation, position)
  4. Captions are overlaid on the video
  5. The captioned video is output

Caption style options

ParameterDescriptionOptions
Font FamilyCaption fontAny supported font
Font SizeText sizeNumeric
ColorMain text colorAny color
Highlight ColorActive word highlightAny color
Stroke ColorText outline colorAny color
Stroke WidthOutline thicknessNumeric
Background ColorOptional text backgroundAny color
Background RadiusRounded corners for backgroundNumeric
AnimationWord appearance stylepop, fade, slide, bounce, shake, zoom, none
Animation ScopeAnimate per word or per groupword or group
Word Group SizeWords shown at onceNumeric
PositionVertical position (0–100%)Percent from top
UppercaseForce uppercaseOn/Off
Shadow ColorText shadowAny color
Shadow BlurShadow spreadNumeric

Credit cost

1 credit per transcription (Whisper).

Tips

  • This node is interactive — you configure caption styles during flow execution
  • The pop and bounce animations work well for short-form vertical content
  • Use word animation scope for a karaoke-style effect
  • Use group animation scope for a more natural reading experience
  • Position captions in the lower third (position: 70–80) for standard placement

Example use cases

  • Adding subtitles to voiceover narration
  • Creating TikTok/Shorts-style word-by-word captions
  • Adding accessibility captions to any video with audio