Skip to main content

Text-to-Video Generation

The Text-to-Video node generates video clips directly from text descriptions. Choose from 10 AI video generation engines with varying quality, speed, and cost.

Inputs

Handle IDData TypeLabel
text-inTextPrompt

Outputs

Handle IDData TypeLabel
video-outVideoVideo

Available engines

Engine IDLabelCost (5s, standard)
kling-o3Kling O35 credits
kling-3.0Kling 3.05 credits
kling-2.6Kling 2.64 credits
fal-seedance-1.5Seedance 1.54 credits
fal-wan-2.6Wan 2.64 credits
grok-imagine-videoGrok Imagine4 credits
sora-2Sora 27 credits
fal-veo3Veo 3.1 (Google)8 credits
fal-minimax-hailuo-2MiniMax Hailuo 25 credits
fal-ltx-2.3LTX 2.32 credits

Configuration options

ParameterDescription
EngineSelect from 10 available video models
PromptText description of the video to generate

Advanced options

ParameterOptionsDefault
TierStandard / ProStandard
Duration5s / 10s5s
SeedAny integerRandom
Pro tier costs 2x standard. 10s duration costs 2x the 5s price.
ParameterOptionsDefault
Duration5s / 10s5s
SeedAny integerRandom
ParameterOptionsDefault
TierStandard / FlashStandard
SeedAny integerRandom
Flash tier is faster and cheaper (0.05/svs0.05/s vs 0.07/s).
ParameterOptionsDefault
TierStandard / ProStandard
Resolution720p / 1080p / 4K1080p
SeedAny integerRandom
Pro tier costs ~1.4x standard. 4K resolution costs 2x.
ParameterOptionsDefault
TierStandard / FastStandard
Fast tier is cheaper (0.10/svs0.10/s vs 0.15/s).
ParameterOptionsDefault
SeedAny integerRandom
ParameterOptionsDefault
TierStandard / FastStandard
SeedAny integerRandom
Guidance Scale1–103.5
Inference Steps1–10030
Fast tier applies a 0.7x cost multiplier. Higher resolution (1440p, 4K) increases cost.

Credit cost

Cost = per-second rate x duration. Default duration is 5 seconds. See Model Costs for detailed pricing.

Tips

  • LTX 2.3 is the most budget-friendly option at ~2 credits for a 5s clip
  • Kling O3 and Veo 3.1 offer the highest quality but cost more
  • Use Wan 2.6 Flash tier for fast, affordable generation
  • Videos are generated at 9:16 aspect ratio (vertical/portrait)

Example use cases

  • Generating scene clips from script descriptions
  • Creating B-roll footage from text descriptions
  • Producing animated backgrounds or environments