FixPromptedVQAGenerator

About 421 wordsAbout 1 min

2026-01-11

📘 Overview

FixPromptedVQAGenerator is a Fixed-Prompt Multimodal VQA Operator.

It is designed to execute the same instruction task on a batch of images or videos. Unlike dynamic templating operators, this operator accepts a static user_prompt (e.g., "Please caption this image") during initialization and applies it uniformly to every media sample in the input DataFrame.

Use Cases:

Batch Image/Video Captioning.
Uniform VQA queries across a dataset (e.g., "Is there any violence in this image?").

🏗️ `init` Function

def __init__(
    self, 
    serving: LLMServingABC, 
    system_prompt: str = "You are a helpful assistant.",
    user_prompt: str = "Please caption the media in detail."
):

🧾 Parameters

Parameter	Type	Default	Description
`serving`	`LLMServingABC`	N/A	The model serving instance for inference (must support multimodal inputs).
`system_prompt`	`str`	`"You are..."`	The system prompt sent to the model.
`user_prompt`	`str`	`"Please caption..."`	Core Parameter. The user instruction (Prompt) applied uniformly to all input samples.

⚡ `run` Function

def run(
    self, 
    storage: DataFlowStorage,
    input_image_key: str = "image", 
    input_video_key: str = "video",
    output_answer_key: str = "answer",
):
    ...

Executes the main logic:

Read Data Reads the DataFrame from storage.
Input Construction

Checks for and reads the input_image_key or input_video_key column.
Constructs the input message for each media file, combining the fixed system_prompt, the media file itself, and the fixed user_prompt.

Batch Inference

Packages the constructed prompts and media data into a batch.
Calls serving.generate_from_input to execute parallel inference.

Save Results

Writes the text generated by the model into the output_answer_key column.
Updates and saves the DataFrame.

🧾 `run` Parameters

Parameter	Type	Default	Description
`storage`	`DataFlowStorage`	N/A	DataFlow storage object.
`input_image_key`	`str`	`"image"`	Column name for image paths (mutually exclusive with video_key).
`input_video_key`	`str`	`"video"`	Column name for video paths (mutually exclusive with image_key).
`output_answer_key`	`str`	`"answer"`	Column name for the generated output.

🧩 Example Usage

from dataflow.utils.storage import FileStorage
from dataflow.core import LLMServing
from dataflow.operators.generate import FixPromptedVQAGenerator

# 1) Initialize Model
serving = LLMServing(model_path="Qwen/Qwen2.5-VL-3B-Instruct")

# 2) Initialize Operator: Set a fixed prompt
# Example: Generate detailed descriptions for a batch of images
generator = FixPromptedVQAGenerator(
    serving=serving,
    system_prompt="You are a helpful visual assistant.",
    user_prompt="Describe the content of this image in detail, including objects, colors, and spatial relationships."
)

# 3) Prepare Data
storage = FileStorage(
    file_name_prefix="image_captioning_task",
    cache_path="./cache_data"
)
storage.step()

# 4) Execute Generation
generator.run(
    storage=storage,
    input_image_key="image_path",
    output_answer_key="detailed_caption"
)

🧾 Input/Output Example

Input DataFrame Row:

image_path
`"/data/cat.jpg"`
`"/data/dog.png"`

Output DataFrame Row:

image_path	detailed_caption
`"/data/cat.jpg"`	`"A black and white cat sitting on a sofa..."`
`"/data/dog.png"`	`"A golden retriever running on the grass..."`

generate

eval

filter

refine

generate

eval

filter

generate

eval

filter

generaterow

refine

FixPromptedVQAGenerator

📘 Overview

🏗️ `init` Function

🧾 Parameters

⚡ `run` Function

🧾 `run` Parameters

🧩 Example Usage

🧾 Input/Output Example

FixPromptedVQAGenerator

📘 Overview

🏗️ __init__ Function

🧾 Parameters

⚡ run Function

🧾 run Parameters

🧩 Example Usage

🧾 Input/Output Example

🏗️ `init` Function

⚡ `run` Function

🧾 `run` Parameters