Audio Q&A Generation

About 257 wordsLess than 1 minute

2025-07-15

Audio Q&A Generation

Step 1: Install Environment

Step 2: Import Relevant Packages

from dataflow.operators.core_audio import PromptedAQAGenerator
from dataflow.serving import LocalModelVLMServing_vllm
from dataflow.utils.storage import FileStorage

Step 3: Start the Local Model Service

The local model serving method is as follows:

vlm_serving = LocalModelVLMServing_vllm(
    hf_model_name_or_path="Qwen/Qwen2-Audio-7B-Instruct", # set to your own model path
    vllm_tensor_parallel_size=2,
    vllm_max_tokens=8192,
    vllm_gpu_memory_utilization=0.7
)

Step 4: Prepare the Audio Data for Caption Generation

Fill in the audio paths in the following format:

{"audio": ["../example_data/audio_aqa_pipeline/test_1.wav"], "conversation": [{"from": "human", "value": "Transcribe the audio into Chinese." }]}
{"audio": ["../example_data/audio_aqa_pipeline/test_2.wav"], "conversation": [{"from": "human", "value": "Describe the sound in this audio clip." }]}

Step 5: Provide the Data Path to FileStorage in the Following Format

storage = FileStorage(
    first_entry_file_name="../example_data/audio_aqa_pipeline/sample_data.jsonl",
    cache_path="./cache",
    file_name_prefix="audio_aqa_pipeline",
    cache_type="jsonl",
)

Step 6: Initialize the PromptedAQAGenerator Operator

prompt_generator = PromptedAQAGenerator(
    vlm_serving=vlm_serving,
    system_prompt="You are a helpful assistant."
)

Step 7: Run the Operator

prompted_generator.run(
    storage=storage.step(),
    input_audio_key="audio",
    output_answer_key="answer",
)

Synthetic Data Example

{"audio":["..\/example_data\/audio_aqa_pipeline\/test_1.wav"],"conversation":[{"from":"human","value":"Transcribe the audio into Chinese."}],"answer":"The audio states: '二十三家全国品牌企业市场份额已达到百分之二十三点三一'"}
{"audio":["..\/example_data\/audio_aqa_pipeline\/test_2.wav"],"conversation":[{"from":"human","value":"Describe the sound in this audio clip."}],"answer":"The audio contains the sound of a machine turning on and off repeatedly."}

Image Generation

Image Editing

Audio Q&A Generation

Audio Q&A Generation

Step 1: Install Environment

Step 2: Import Relevant Packages

Step 3: Start the Local Model Service

Step 4: Prepare the Audio Data for Caption Generation

Step 5: Provide the Data Path to FileStorage in the Following Format

Step 6: Initialize the PromptedAQAGenerator Operator

Step 7: Run the Operator

Synthetic Data Example