Use Whisper for Speech Transcription or Translation
About 264 wordsLess than 1 minute
2025-07-15
Use Whisper for Speech Transcription or Translation
Step 1: Prepare the DataFlow environment
conda create -n myvenv python=3.12
pip install open-dataflow
pip install open-dataflow[vllm]
Step 2: Install the DataFlow audio module
pip install open-dataflow[audio]
Step 3: Launch the local model service
How to Call the Local Model Service:
llm_serving = LocalModelLLMServing_vllm(
hf_model_name_or_path="./models/whisper-large-v3", # set to your own model path
vllm_tensor_parallel_size=2,
vllm_max_tokens=None,
vllm_gpu_memory_utilization=0.7
)
Step 4: Fill in the audio path in the specified format and prepare the data that needs audio transcription or translation.
{"audio": ["your_audio_path"]}
Step 5: Set the data path in FileStorage using the format below.
storage = FileStorage(
first_entry_file_name="your_path",
cache_path="./cache",
file_name_prefix="whisper_transcription",
cache_type="jsonl",
media_key="audio",
media_type="audio"
)
Step 6: Initialize the WhisperTranscriptionGenerator operator.
generator = WhisperTranscriptionGenerator(self.llm_serving)
Step 7: Run the operator
Speech Transcription
generator.run(
storage=self.storage.step(),
task="transcribe", # The current task is speech transcription
language="mandarin", # The language in the speech (default: english)
use_no_time_stamps=True, # Use timestamp-free output format (default: True)
output_key="transcription" # The key under which the result will be stored in the output dataframe
)
Speech Translation, translate the language in the speech into English.
generator.run(
storage=self.storage.step(),
task="translate", # The current task is speech translation
language="mandarin", # The language in the speech (default: english)
use_no_time_stamps=True, # Use timestamp-free output format (default: True)
output_key="transcription" # The key under which the result will be stored in the output dataframe
)