PersQAGenerator

About 519 wordsAbout 2 min

2026-01-24

📘 Overview

PersQAGenerator is an operator designed for generating personalized image Question-Answering (QA) pairs based on Large Vision-Language Models (VLM).
This operator focuses on "character-centric" QA generation: it automatically assigns a name tag (default is <mam>) to the main character in an image, randomly selects questions from a predefined pool, and constrains the model to begin its response with the character's tag.

Key Features:

Identity Anchoring: Automatically assigns the <mam> tag to the main protagonist for personalized reference.
Template Driven: Built-in PersQAGeneratorPrompt automatically constructs system prompts and question templates.
Dynamic Injection: Automatically modifies the conversation context during the run process, eliminating the need to manually construct questions.
Structured Output: Produces character-aligned responses suitable for evaluating character-centric multimodal model performance.

🏗️ `init` Function

def __init__(
    self,
    llm_serving: LLMServingABC
):
    ...

🧾 `init` Parameter Description

Parameter	Type	Default	Description
`llm_serving`	`LLMServingABC`	-	Model serving object, used to call the VLM for inference

Note: The operator internally initializes PersQAGeneratorPrompt and configures the system_prompt, so users do not need to provide them manually.

⚡ `run` Function

def run(
    self,
    storage: DataFlowStorage,
    input_modal_key: str = "image", 
    output_key: str = "output"
):
    ...

The main logic of the run operator:

Reads data from storage.
Automatically generates a personalized question containing the <mam> tag.
Data Rewriting: Fills the generated Prompt into the conversation field.
Calls the model to generate a response starting with <mam> and saves it in the output_key.

🧾 `run` Parameter Description

Parameter	Type	Default	Description
`storage`	`DataFlowStorage`	-	Dataflow unified data storage object
`input_modal_key`	`str`	`"image"`	Image field name
`output_key`	`str`	`"output"`	Field name for the generated personalized answer

🧠 Example Usage

from dataflow.utils.storage import FileStorage
from dataflow.serving.local_model_vlm_serving import LocalModelVLMServing_vllm
from dataflow.operators.core_vision import PersQAGenerator

# 1. Initialize the inference engine
model = LocalModelVLMServing_vllm(
    hf_model_name_or_path="Qwen/Qwen2.5-VL-3B-Instruct",
    vllm_tensor_parallel_size=1,
)

# 2. Initialize the operator (Prompt templates handled internally)
generator = PersQAGenerator(llm_serving=model)

# 3. Prepare data
storage = FileStorage(
    first_entry_file_name="./sample_data.json", 
    cache_path="./cache_local",
    file_name_prefix="pers_qa_res",
    cache_type="json",
)
storage.step()

# 4. Execute generation
generator.run(
    storage=storage,
    input_modal_key="image",
    output_key="pers_qa"
)

🧾 Data Flow Examples

📥 Example Input

Note: The initial value in the conversation field will be automatically overwritten by the operator with the generated personalized Prompt.

[
    {
        "source":["[https://huggingface.co/datasets/.../0.png](https://huggingface.co/datasets/.../0.png)"],
        "image": ["./dataflow/example/test_data/0.png"],
        "conversation": [
            {
                "from": "human",
                "value": "Any content, will be automatically overwritten later"
            }
        ]
    }
]

📤 Example Output

The operator automatically constructs the required instructions in the conversation field and returns the model's personalized answer in the pers_qa field.

[
  {
    "source":["[https://huggingface.co/datasets/.../0.png](https://huggingface.co/datasets/.../0.png)"],
    "image":["./dataflow/example/test_data/0.png"],
    "conversation":[
      {
        "from":"human",
        "value":"The name of the main character in the image is <mam>. You need to answer a question about <mam>.\nQuestion: How would you describe <mam>'s attire? Please answer starting with <mam>!\nAnswer: "
      }
    ],
    "pers_qa":"<mam> is dressed in a formal black suit with a white bow tie, exuding a sophisticated and elegant appearance."
  }
]

Tips: The identifier <mam> is hardcoded within the operator (but can be customized in the source). It is recommended to use high-performance MLLMs to ensure the model strictly follows the constraint of starting the response with the specified tag.

generate

eval

filter

refine

generate

eval

filter

generate

eval

filter

generaterow

refine

PersQAGenerator

📘 Overview

🏗️ `init` Function

🧾 `init` Parameter Description

⚡ `run` Function

🧾 `run` Parameter Description

🧠 Example Usage

🧾 Data Flow Examples

📥 Example Input

📤 Example Output