VisualReasoningGenerator
About 461 wordsAbout 2 min
2026-01-11
📘 Overview
VisualReasoningGenerator is a Visual Reasoning Generation Operator designed to invoke VLMs for creating detailed reasoning processes (e.g., text containing <think> and <answer> tags).
This operator features a built-in Fallback Mechanism: before generation, it checks a specified input_existing_chains_key column. If valid reasoning chain data already exists for a row, the operator reuses that data directly, skipping model inference. This makes it ideal for resuming interrupted tasks or completing partially processed datasets.
🏗️ __init__ Function
def __init__(
self,
serving: LLMServingABC,
prompt_type: str = "web_grounding"
):🧾 Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
serving | LLMServingABC | N/A | The model serving instance for inference. |
prompt_type | str | "web_grounding" | Prompt Type Key. Used to retrieve the corresponding System Prompt from the MCTReasoningPrompt library (e.g., preset prompts for web grounding, math reasoning, etc.). |
⚡ run Function
def run(
self,
storage: DataFlowStorage,
input_question_key: str,
input_image_key: str,
output_key: str,
input_existing_chains_key: Optional[str] = None
):
...Executes the main logic:
- Fallback Check
- If
input_existing_chains_keyis provided, checks this column in the DataFrame. - If a row contains a non-empty list, the operator uses this existing data and skips model invocation for that row.
- Input Construction
- For samples requiring generation, reads
input_question_keyandinput_image_key. - Constructs a multimodal input
[Image, Text]using theSystem Promptdetermined during initialization.
- Batch Generation
- Packages pending requests into a batch.
- Calls
serving.generate_from_inputto execute inference.
- Result Integration
- Merges the reused old data with the newly generated data (wrapped as a List).
- Writes to
output_keyand saves.
🧾 run Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage | DataFlowStorage | N/A | DataFlow storage object. |
input_question_key | str | N/A | Column name for the question text. |
input_image_key | str | N/A | Column name for the image path. |
output_key | str | N/A | Column name for the output result (stored as List[str]). |
input_existing_chains_key | str | None | (Optional) Existing Chains Column. If this column has values, generation is skipped. |
🧩 Example Usage
from dataflow.utils.storage import FileStorage
from dataflow.core import LLMServing
from dataflow.operators.generate import VisualReasoningGenerator
# 1) Initialize Model
serving = LLMServing(model_path="Qwen/Qwen2.5-VL-7B-Instruct")
# 2) Initialize Operator
# prompt_type="web_grounding" loads the corresponding System Prompt automatically
generator = VisualReasoningGenerator(
serving=serving,
prompt_type="web_grounding"
)
# 3) Prepare Data
# Assume we have partially processed data where 'history_reasoning' is populated for some rows
storage = FileStorage(file_name_prefix="reasoning_task")
storage.step()
# 4) Execute Generation (with Resume capability)
generator.run(
storage=storage,
input_question_key="question",
input_image_key="image",
output_key="reasoning_result",
input_existing_chains_key="history_reasoning" # Prioritize data from this column
)🧾 Input/Output Example
Input DataFrame:
| image | question | history_reasoning |
|---|---|---|
"1.jpg" | "Find the button." | ["<think>The button is red...</think>..."] |
"2.jpg" | "Where is the logo?" | [] (or None) |
Output DataFrame (reasoning_result):
| image | question | reasoning_result | Note |
|---|---|---|---|
"1.jpg" | "Find the button." | ["<think>The button is red...</think>..."] | Reuse: Copied from history_reasoning |
"2.jpg" | "Where is the logo?" | ["<think>Scanning image...</think> Top left."] | Gen: Generated by model |

