Skip to main content

Mixed Media Analysis

The Mixed Media Analysis node uses multimodal AI to analyze a combination of images, videos, and text. It can describe, compare, extract information, or generate prompts based on mixed media inputs.

Inputs

Handle IDData TypeLabel
text-inTextInstruction
Plus dynamic handles — you can connect additional image and video inputs. The node accepts any combination of text, image, and video inputs.

Outputs

Handle IDData TypeLabel
text-outTextPrompt / Analysis

Available engines

Engine IDLabelCost
gemini-flashGemini Flash1 credit
gemini-proGemini Pro2 credits

How it works

  1. Connect text instructions, images, and/or videos to the input handles
  2. Write an instruction describing what you want the AI to analyze or produce
  3. The multimodal AI processes all inputs together
  4. The result (text) is output — this could be a description, analysis, or generated prompt

Configuration options

ParameterDescription
EngineGemini Flash (faster) or Gemini Pro (more capable)
InstructionWhat to do with the media inputs

Credit cost

EngineCredits
Gemini Flash1 credit
Gemini Pro2 credits

Tips

  • This node is interactive — you provide instructions during flow execution
  • Use it to generate video prompts based on reference images
  • Great for analyzing existing content and producing descriptions
  • Dynamic inputs allow flexible multi-modal workflows

Example use cases

  • Analyzing a reference video and generating a text description
  • Comparing multiple images and selecting the best one via AI
  • Generating video prompts that match the style of reference images
  • Extracting information from screenshots or frames