Text2MultiHopQAGenerator

About 500 wordsAbout 2 min

2025-10-09

📘 Overview

Text2MultiHopQAGenerator is a multi-hop question-answer pair generator operator designed to automatically produce questions and answers that require multi-step reasoning from a given text. This operator leverages a Large Language Model (LLM) to transform input text into a structured set of reasoning-based QA pairs. It is suitable for building complex QA datasets or evaluating a model’s reasoning ability.

init Function

__init__(self, llm_serving, seed=0, lang="en", prompt_template=None, num_q=5)

init Parameter Description

Parameter	Type	Default	Description
llm_serving	LLMServingABC	Required	The LLM service instance used for inference and generation.
seed	int	0	Random seed to ensure reproducibility of results.
lang	str	"en"	Specifies the output language, e.g., 'en' (English) or 'zh' (Chinese).
prompt_template	PromptABC	Text2MultiHopQAGeneratorPrompt()	Prompt template object for constructing model input.
num_q	int	5	The maximum number of QA pairs to generate for each input text.

Prompt Template Description

Prompt Template Name	Main Purpose	Application Scenario	Feature Description
Text2MultiHopQAGeneratorPrompt	Generate multi-hop QA pairs from text	Scenarios requiring complex reasoning questions from long context	Built-in specialized template that guides the model to generate questions, reasoning steps, final answers, and supporting facts to ensure structured and logical outputs.

run Function

run(self, storage, input_key='cleaned_chunk', output_key='QA_pairs', output_meta_key='QA_metadata')

Parameters

Name	Type	Default	Description
storage	DataFlowStorage	Required	Data flow storage instance responsible for reading and writing data.
input_key	str	"cleaned_chunk"	Input column name corresponding to the context text field.
output_key	str	"QA_pairs"	Output column name corresponding to the generated multi-hop QA pairs list.
output_meta_key	str	"QA_metadata"	Output metadata column name corresponding to the generated metadata information.

🧠 Example Usage

self.knowledge_cleaning_step4 = Text2MultiHopQAGenerator(
    llm_serving=self.llm_serving,
    lang="en",
    num_q = 5
)
self.knowledge_cleaning_step4.run(
    storage=self.storage.step(),
    # input_key=
    # output_key=
)

🧾 Default Output Format

Field	Type	Description
text	str	The processed original context text.
qa_pairs	List[Dict]	List of generated multi-hop QA pairs, each containing question, answer, reasoning steps, etc.
metadata	Dict	Metadata containing source, timestamp, complexity, and other information.

Example Input:

{
"cleaned_chunk":"Mona Lisa was painted by Leonardo da Vinci. Leonardo da Vinci was born in the Republic of Florence. The Republic of Florence was a state in what is now Italy."
}

Example Output:

{
    "cleaned_chunk": "Mona Lisa was painted by Leonardo da Vinci. Leonardo da Vinci was born in the Republic of Florence. The Republic of Florence was a state in what is now Italy.",
    "QA_pairs": [
        {
            "question": "In which modern country was the painter of the Mona Lisa born?",
            "reasoning_steps": [
                {
                    "step": "Identify the painter of the Mona Lisa, which is Leonardo da Vinci."
                },
                {
                    "step": "Find the birthplace of Leonardo da Vinci, which is the Republic of Florence."
                },
                {
                    "step": "Determine the modern-day location of the Republic of Florence, which is Italy."
                }
            ],
            "answer": "Italy",
            "supporting_facts": [
                "Mona Lisa was painted by Leonardo da Vinci.",
                "Leonardo da Vinci was born in the Republic of Florence.",
                "The Republic of Florence was a state in what is now Italy."
            ],
            "type": "History"
        }
    ],
    "QA_metadata": {
        "source": "default_source",
        "timestamp": "2023-10-27T10:00:00Z",
        "complexity": 3
    }
}

eval

generate

eval

generate

eval

filter

generate

eval

filter

generate

generate

eval

filter

refine

generate

generate

generate

eval

filter

refine

generate

generate

eval

filter

generate

eval

filter

generate

eval

generate

filter

eval

filter

generate

refine

Text2MultiHopQAGenerator