Case 9. Speech transcription

About 167 wordsLess than 1 minute

2025-08-22

This example demonstrates how to use the SpeechTranscriptor operator for speech-to-text transcription.

Speech Transcription

Step 1: Install the Dataflow Environment

pip install open-dataflow[vllm]

Step 2: Create a New Dataflow Working Directory

mkdir run_dataflow
cd run_dataflow

Step 3: Initialize Dataflow

dataflow init

After this step, you should see:

run_dataflow/gpu_pipelines/speechtranscription_pipeline.py

Step 4: Prepare the data to be translated.

self.storage = FileStorage(
    first_entry_file_name="../example_data/SpeechTranscription/pipeline_speechtranscription.jsonl", # your data path
    cache_path="./cache",
    file_name_prefix="dataflow_cache_step",
    cache_type="jsonl",
)

Data format is as follows

{"raw_content": "../example_data/SpeechTranscription/audio/test.wav"}
{"raw_content": "https://raw.githubusercontent.com/FireRedTeam/FireRedASR/main/examples/wav/IT0011W0001.wav"}

Step 5: Launch serving

self.llm_serving = LocalModelLALMServing_vllm(
    hf_model_name_or_path='Qwen/Qwen2-Audio-7B-Instruct',   # your model path
    vllm_tensor_parallel_size=4,
    vllm_max_tokens=8192,
)

Step 6: Speech transcription operator

self.speech_transcriptor = SpeechTranscriptor(
    llm_serving = self.llm_serving,                  
    system_prompt="You are a professional translator; your task is to transcribe speech into text and then translate it into English."  # model system prompt
)

Step 7: Run the operator

self.speech_transcriptor.run(
    storage=self.storage.step(),
    input_key="raw_content"
)