Skip to main content

Image-to-Video Generation

The Image-to-Video node takes a still image and animates it into a video clip. It uses the same 10 engines as Text-to-Video, with the addition of an image input for visual reference.

Inputs

Handle IDData TypeLabel
image-inImageImage
text-inTextPrompt

Outputs

Handle IDData TypeLabel
video-outVideoVideo

Available engines

Engine IDLabelCost (5s, standard)
kling-o3Kling O35 credits
kling-3.0Kling 3.05 credits
kling-2.6Kling 2.64 credits
fal-seedance-1.5Seedance 1.54 credits
fal-wan-2.6Wan 2.64 credits
grok-imagine-videoGrok Imagine4 credits
sora-2Sora 27 credits
fal-veo3Veo 3.1 (Google)8 credits
fal-minimax-hailuo-2MiniMax Hailuo 25 credits
fal-ltx-2.3LTX 2.32 credits

How it works

  1. An image is received from the input handle (from Image Generation, Media Upload, etc.)
  2. A text prompt describes the desired motion and action
  3. The AI model animates the image according to the prompt
  4. The resulting video clip is output

Configuration options

Same as Text-to-Video — engine selection and all advanced options apply identically.

Credit cost

Same pricing as Text-to-Video. Cost = per-second rate x duration.

Tips

  • The text prompt should describe motion — what should move, how the camera should pan, etc.
  • Combine with Image Generation for a two-step pipeline: generate image → animate to video
  • Image-to-video typically produces more consistent results than text-to-video since it has a visual reference
  • Works especially well with the Multi-Shot Generation pipeline

Example use cases

  • Animating AI-generated character images into scenes
  • Adding subtle motion to product shots
  • Creating video from illustrated storyboard frames