Image VQA 视觉问答数据生成流水线（API版）

855 字约 3 分钟

2026-02-10

1. 概述

Image VQA 视觉问答数据生成流水线（API版） 专注于从图像内容出发，自动构建高质量的问答对（Question-Answer Pairs）。该流水线利用高性能 VLM API，根据图像的视觉特征生成符合人类逻辑的提问与准确回答。这对于训练多模态对话模型、评估模型视觉理解能力以及构建行业特定（如医疗、安防、电商）的 VQA 数据集具有重要价值。

我们支持以下应用场景：

指令微调数据合成：生成多样化的提问方式以增强模型的交互能力。
视觉理解评估：针对图像细节生成判断、描述或推理型问答。
自动化标注：替代人工进行大规模图像问答标注，降低数据生产成本。

2. 快速开始

第一步：配置 API Key

确保您的环境变量中已设置 API 访问权限：

import os
os.environ["DF_API_KEY"] = "sk-your-key-here"

第二步：初始化环境

# 创建并进入工作目录
mkdir run_vqa_dataflow
cd run_vqa_dataflow

# 初始化 DataFlow-MM 配置
dataflowmm init

第三步：下载示例数据

huggingface-cli download --repo-type dataset OpenDCAI/dataflow-demo-image --local-dir example_data

第四步：配置运行脚本

在 api_pipelines/image_vqa.py 中，您可以自定义 VLM 的模型名称和 API 信息：

self.vlm_serving = APIVLMServing_openai(
    api_url="http://172.96.141.132:3001/v1", # 支持任意 OpenAI 兼容接口
    key_name_of_api_key="DF_API_KEY",
    model_name="gpt-5-nano-2025-08-07",
    max_workers=10
)

第五步：执行流水线

python api_pipelines/image_vqa.py

3. 数据流与逻辑说明

1. 输入数据格式

输入文件需包含图像路径及触发 VQA 生成的提示引导语：

[
    {
        "image": ["./example_data/image_vqa/person.png"],
        "conversation": [
            {
                "from": "human",
                "value": "Please generate a relevant question based on the content of the picture, and only output the question content."
            }
        ]
    }
]

2. 核心算子：PromptedVQAGenerator

此算子是生成问答对的核心引擎：

角色定义：通过 system_prompt 设置为 "image question-answer generator"，引导模型输出标准的问答格式。
多轮支持：能够结合 conversation 字段中的历史上下文或特定指令来优化问题生成的侧重点。
高吞吐处理：利用 max_workers 实现并行调用，适合处理万级以上的图像数据。

3. 输出结果示例

生成的 VQA 结果将以文本形式存储在 vqa 字段中，通常包含多个 Q&A 组合：

[
  {
    "image": ["./example_data/image_vqa/person.png"],
    "conversation":[
      {
        "from":"human",
        "value":"Please generate a relevant question based on the content of the picture, and only output the question content."
      }
    ],
    "question":"Who is the main actor in the movie \"Nightmare Alley\"?",
    "answer":"The main actor in the movie \"Nightmare Alley\" is Bradley Cooper."
  }
]

4. 流水线完整代码

import os

# 设置 API Key 环境变量
os.environ["DF_API_KEY"] = "sk-xxx"

from dataflow.utils.storage import FileStorage
from dataflow.core import LLMServingABC
from dataflow.serving.api_vlm_serving_openai import APIVLMServing_openai
from dataflow.operators.core_vision import PromptedVQAGenerator


class ImageVQAPipeline:
    """
    一行命令即可完成图片批量 VQA 生成。
    """

    def __init__(self, llm_serving: LLMServingABC = None):

        # ---------- 1. Storage ----------
        self.storage = FileStorage(
            first_entry_file_name="./example_data/image_vqa/sample_data.json",
            cache_path="./cache_local",
            file_name_prefix="qa_api",
            cache_type="json",
        )

        # ---------- 2. Serving ----------
        self.vlm_serving = APIVLMServing_openai(
            api_url="https://dashscope.aliyuncs.com/compatible-mode/v1", # Any API platform compatible with OpenAI format
            key_name_of_api_key="DF_API_KEY", # Set the API key for the corresponding platform in the environment variable or line 4
            model_name="qwen3-vl-8b-instruct",
            image_io=None,
            send_request_stream=False,
            max_workers=10,
            timeout=1800
        )

        # ---------- 3. Operator ----------
        self.vqa_generator = PromptedVQAGenerator(
            serving=self.vlm_serving,
            system_prompt= "You are a image question-answer generator. Your task is to generate a question-answer pair for the given image content."
        )

    # ------------------------------------------------------------------ #
    def forward(self):
        input_image_key = "image"
        output_step1_key = "question"
        output_step2_key = "answer"

        # Step 1: Generate the question for the image
        self.vqa_generator.run(
            storage=self.storage.step(),
            input_conversation_key="conversation",
            input_image_key=input_image_key,
            output_answer_key=output_step1_key,
        )

        # Step 2: Generate the answer for the question
        self.vqa_generator.run(
            storage=self.storage.step(),
            input_prompt_key=output_step1_key,
            input_image_key=input_image_key,
            output_answer_key=output_step2_key,
        )

# ---------------------------- CLI 入口 -------------------------------- #
if __name__ == "__main__":
    pipe = ImageVQAPipeline()
    pipe.forward()

图像生成

图像编辑