PromptTemplatedVQAGenerator

About 521 wordsAbout 2 min

2026-01-11

📘 Overview

PromptTemplatedVQAGenerator is a Template-Based Multimodal VQA Operator. It allows users to dynamically inject multiple fields from a DataFrame into a predefined Prompt Template to generate customized text instructions, which are then combined with image or video inputs for batch inference.

Unlike standard VQA operators, this operator supports complex prompt construction logic (e.g., dynamically filling in categories, context descriptions, etc.), making it highly suitable for scenarios requiring structured prompt engineering, such as attribute-guided image captioning or controlled dialogue simulation.

🏗️ `init` Function

def __init__(
    self,
    serving: LLMServingABC,
    prompt_template: NamedPlaceholderPromptTemplate,
    system_prompt: str = "You are a helpful assistant.",
):

🧾 Parameters

Parameter	Type	Default	Description
`serving`	`LLMServingABC`	N/A	The model serving instance for inference (must support multimodal inputs).
`prompt_template`	`NamedPlaceholderPromptTemplate`	N/A	A template object implementing `build_prompt` to convert dictionary data into a string prompt.
`system_prompt`	`str`	`"You are..."`	The system prompt sent to the model.

⚡ `run` Function

def run(
    self,
    storage: DataFlowStorage,
    input_image_key: str = "image",
    input_video_key: str = "video",
    output_answer_key: str = "answer",
    **input_keys,
):
    ...

Executes the main logic:

Read Data Reads the DataFrame from storage.
Dynamic Prompt Construction Iterates through each row of the DataFrame:

Extracts data from columns specified in input_keys (e.g., descriptions column, type column).
Calls prompt_template.build_prompt() to fill these values into the template, generating a unique prompt_text for that sample.

Multimodal Input Assembly

Reads media paths from input_image_key or input_video_key.
Packages the generated text prompt with the corresponding image/video data into the format required by the model.

Inference & Output

Calls the model service for batch generation.
Writes the results to the column specified by output_answer_key and saves the updated DataFrame.

🧾 `run` Parameters

Parameter	Type	Default	Description
`storage`	`DataFlowStorage`	N/A	DataFlow storage object.
`input_image_key`	`str`	`"image"`	Column name for image paths (mutually exclusive with video_key).
`input_video_key`	`str`	`"video"`	Column name for video paths (mutually exclusive with image_key).
`output_answer_key`	`str`	`"answer"`	Column name for the generated output.
`**input_keys`	`kwargs`	N/A	Key Parameter. Defines the mapping between template placeholders and DataFrame columns.

Format: template_var="dataframe_column". |

🧩 Example Usage

from dataflow.utils.storage import FileStorage
from dataflow.core import LLMServing
from dataflow.prompts.prompt_template import NamedPlaceholderPromptTemplate
from dataflow.operators.generate import PromptTemplatedVQAGenerator

# 1) Define a template with placeholders
# We want the model to check for a specific object type, referencing existing descriptions
TEMPLATE = (
    "Context: {descriptions}\n\n"
    "Task: Describe the appearance of {type} in the image based on the context above."
)
prompt_template = NamedPlaceholderPromptTemplate(template=TEMPLATE)

# 2) Initialize Operator
op = PromptTemplatedVQAGenerator(
    serving=LLMServing(model_path="Qwen/Qwen2.5-VL-3B-Instruct"),
    prompt_template=prompt_template
)

# 3) Prepare Data (assuming jsonl has image, meta_desc, obj_type columns)
storage = FileStorage(file_name_prefix="vqa_task")
storage.step()

# 4) Run Operator: Map 'meta_desc' to {descriptions}, 'obj_type' to {type}
op.run(
    storage=storage,
    input_image_key="image",
    output_answer_key="generated_caption",
    # Dynamic Mapping:
    descriptions="meta_desc", 
    type="obj_type"
)

🧾 Input/Output Example

Input DataFrame Row:

image	meta_desc	obj_type
`"/path/to/car.jpg"`	`"A photo taken on a sunny day."`	`"vintage car"`

Constructed Prompt:

"Context: A photo taken on a sunny day.\n\nTask: Describe the appearance of vintage car in the image based on the context above."

Output DataFrame Row:

image	meta_desc	obj_type	generated_caption
`"/path/to/car.jpg"`	`...`	`...`	`"The vintage car is red with..."`

generate

eval

filter

refine

generate

eval

filter

generate

eval

filter

generaterow

refine

PromptTemplatedVQAGenerator

📘 Overview

🏗️ `init` Function

🧾 Parameters

⚡ `run` Function

🧾 `run` Parameters

🧩 Example Usage

🧾 Input/Output Example

PromptTemplatedVQAGenerator

📘 Overview

🏗️ __init__ Function

🧾 Parameters

⚡ run Function

🧾 run Parameters

🧩 Example Usage

🧾 Input/Output Example

🏗️ `init` Function

⚡ `run` Function

🧾 `run` Parameters