VisualReasoningGenerator

About 461 wordsAbout 2 min

2026-01-11

📘 Overview

VisualReasoningGenerator is a Visual Reasoning Generation Operator designed to invoke VLMs for creating detailed reasoning processes (e.g., text containing <think> and <answer> tags).

This operator features a built-in Fallback Mechanism: before generation, it checks a specified input_existing_chains_key column. If valid reasoning chain data already exists for a row, the operator reuses that data directly, skipping model inference. This makes it ideal for resuming interrupted tasks or completing partially processed datasets.

🏗️ `init` Function

def __init__(
    self, 
    serving: LLMServingABC, 
    prompt_type: str = "web_grounding"
):

🧾 Parameters

Parameter	Type	Default	Description
`serving`	`LLMServingABC`	N/A	The model serving instance for inference.
`prompt_type`	`str`	`"web_grounding"`	Prompt Type Key. Used to retrieve the corresponding System Prompt from the `MCTReasoningPrompt` library (e.g., preset prompts for web grounding, math reasoning, etc.).

⚡ `run` Function

def run(
    self, 
    storage: DataFlowStorage, 
    input_question_key: str, 
    input_image_key: str, 
    output_key: str,
    input_existing_chains_key: Optional[str] = None
):
    ...

Executes the main logic:

Fallback Check

If input_existing_chains_key is provided, checks this column in the DataFrame.
If a row contains a non-empty list, the operator uses this existing data and skips model invocation for that row.

Input Construction

For samples requiring generation, reads input_question_key and input_image_key.
Constructs a multimodal input [Image, Text] using the System Prompt determined during initialization.

Batch Generation

Packages pending requests into a batch.
Calls serving.generate_from_input to execute inference.

Result Integration

Merges the reused old data with the newly generated data (wrapped as a List).
Writes to output_key and saves.

🧾 `run` Parameters

Parameter	Type	Default	Description
`storage`	`DataFlowStorage`	N/A	DataFlow storage object.
`input_question_key`	`str`	N/A	Column name for the question text.
`input_image_key`	`str`	N/A	Column name for the image path.
`output_key`	`str`	N/A	Column name for the output result (stored as `List[str]`).
`input_existing_chains_key`	`str`	`None`	(Optional) Existing Chains Column. If this column has values, generation is skipped.

🧩 Example Usage

from dataflow.utils.storage import FileStorage
from dataflow.core import LLMServing
from dataflow.operators.generate import VisualReasoningGenerator

# 1) Initialize Model
serving = LLMServing(model_path="Qwen/Qwen2.5-VL-7B-Instruct")

# 2) Initialize Operator
# prompt_type="web_grounding" loads the corresponding System Prompt automatically
generator = VisualReasoningGenerator(
    serving=serving,
    prompt_type="web_grounding"
)

# 3) Prepare Data
# Assume we have partially processed data where 'history_reasoning' is populated for some rows
storage = FileStorage(file_name_prefix="reasoning_task")
storage.step()

# 4) Execute Generation (with Resume capability)
generator.run(
    storage=storage,
    input_question_key="question",
    input_image_key="image",
    output_key="reasoning_result",
    input_existing_chains_key="history_reasoning" # Prioritize data from this column
)

🧾 Input/Output Example

Input DataFrame:

image	question	history_reasoning
`"1.jpg"`	`"Find the button."`	`["<think>The button is red...</think>..."]`
`"2.jpg"`	`"Where is the logo?"`	`[]` (or `None`)

Output DataFrame (reasoning_result):

image	question	reasoning_result	Note
`"1.jpg"`	`"Find the button."`	`["<think>The button is red...</think>..."]`	Reuse: Copied from `history_reasoning`
`"2.jpg"`	`"Where is the logo?"`	`["<think>Scanning image...</think> Top left."]`	Gen: Generated by model

generate

eval

filter

refine

generate

eval

filter

generate

eval

filter

generaterow

refine

VisualReasoningGenerator

📘 Overview

🏗️ `init` Function

🧾 Parameters

⚡ `run` Function

🧾 `run` Parameters

🧩 Example Usage

🧾 Input/Output Example

VisualReasoningGenerator

📘 Overview

🏗️ __init__ Function

🧾 Parameters

⚡ run Function

🧾 run Parameters

🧩 Example Usage

🧾 Input/Output Example

🏗️ `init` Function

⚡ `run` Function

🧾 `run` Parameters