Case1-Translation,QA Generation,Abbreviation

About 364 wordsAbout 1 min

2025-06-30

Step 1: Install the Dataflow Environment

pip install open-dataflow

Step 2: Create a New Dataflow Working Directory

mkdir run_dataflow
cd run_dataflow

Step 3: Initialize Dataflow

dataflow init

After this step, you should see:

run_dataflow/playground/generate_qa_api.py  # (api LLM)
run_dataflow/playground/generate_qa_local.py  # (local LLM)

Step 4: (API Translation Option) Set Your API Key and API URL

For Linux/ Mac OS

export DF_API_KEY="sk-xxxxx"

For Windows

$env:DF_API_KEY = "sk-xxxxx"

Configure the api_url as shown below:

self.llm_serving = APILLMServing_request(
    api_url="https://api.openai.com/v1/chat/completions",
    model_name="gpt-4o",
    max_workers=100
)

Step 4: (Local Model Translation Option).

For local models, use the following configuration:

self.llm_serving = LocalModelLLMServing_vllm(
    hf_model_name_or_path="Qwen2.5-7B-Instruct",  # set to your own model path
    vllm_tensor_parallel_size=1,
    vllm_max_tokens=8192,
)

Step 5: Prepare the Data to Be Translated

Create a .jsonl file with the following format:

{"raw_content": "This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here."}

Then specify the file path in your configuration:

self.storage = FileStorage(
    first_entry_file_name="your path",
    cache_path="./cache",
    file_name_prefix="raw_content",
    cache_type="jsonl",
)

Step 6: Preparing the Prompt for Translation

Use the following configuration for translation tasks:

self.prompt_generator = PromptedGenerator(
    llm_serving=self.llm_serving,
    system_prompt="Please translate to Chinese.",  # System prompt for translation
)

Answer Generation Task

This task is similar to machine translation. Simply replace the script with one of the following:

generate_qa_api.py
generate_qa_local.py

The key change is to replace the system_prompt with one suitable for answer generation:

self.prompt_generator = PromptedGenerator(
    llm_serving=self.llm_serving,
    system_prompt="Please solve this math problem.",  # Prompt for solving math problems
)

Abbreviation Task

The abbreviation task follows the same structure. Just switch to the corresponding script:

abbreviation_qa_api.py
abbreviation_qa_local.py

And update the system_prompt to one designed for summarization:

self.prompt_generator = PromptedGenerator(
    llm_serving=self.llm_serving,
    system_prompt="Please rewrite the following paragraph into a concise summary that preserves the core meaning and key information:",  # Prompt for abbreviation
)

Supporting Other Tasks

To support additional task types, simply adjust the system_prompt accordingly while keeping the rest of the workflow unchanged.