Text2QAGenerator
About 470 wordsAbout 2 min
2025-10-09
📘 Overview
The Text2QAGenerator is an operator designed to generate Question-Answer (QA) pairs from given text documents. It first generates a specific prompt tailored to the content and then uses that prompt along with the original text to call a Large Language Model (LLM) and produce a relevant question and answer.
Output Validation: After generation, the operator automatically filters out rows where the generated question or answer is empty (i.e., the QA generation failed). The number of dropped rows is logged as a warning for traceability.
__init__
def __init__(self, llm_serving: LLMServingABC)| Parameter | Type | Default | Description |
|---|---|---|---|
| llm_serving | LLMServingABC | Required | A Large Language Model serving instance used for generation. |
Prompt Template Descriptions
| Prompt Template Name | Primary Use | Applicable Scenarios | Feature Description |
|---|---|---|---|
run
def run(self, storage: DataFlowStorage, input_key:str = "text", input_question_num:int = 1, output_prompt_key:str = "generated_prompt", output_quesion_key:str = "generated_question", output_answer_key:str = "generated_answer")| Parameter | Type | Default | Description |
|---|---|---|---|
| storage | DataFlowStorage | Required | The DataFlow storage instance for reading and writing data. |
| input_key | str | "text" | The column name in the input data that contains the document text. |
| input_question_num | int | 1 | The number of QA pairs to generate for each input document. |
| output_prompt_key | str | "generated_prompt" | The column name to store the intermediate generated prompt. |
| output_quesion_key | str | "generated_question" | The column name to store the generated question. |
| output_answer_key | str | "generated_answer" | The column name to store the generated answer. |
🧠 Example Usage
🧾 Default Output Format
Note: Rows where the generated question or answer is empty are automatically removed from the output. If any rows are dropped, a warning is logged with the count of dropped rows.
| Field | Type | Description |
|---|---|---|
| text | str | The input document text. |
| generated_prompt | str | The intermediate prompt generated by the LLM. |
| generated_question | str | The final question generated by the LLM (guaranteed non-empty). |
| generated_answer | str | The final answer generated by the LLM (guaranteed non-empty). |
Example Input:
{
"text": "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France. It is named after the engineer Gustave Eiffel, whose company designed and built the tower. It was constructed from 1887 to 1889 as the centerpiece of the 1889 World's Fair."
}Example Output:
{
"text": "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France. It is named after the engineer Gustave Eiffel, whose company designed and built the tower. It was constructed from 1887 to 1889 as the centerpiece of the 1889 World's Fair.",
"generated_prompt": "You are a question generator. Given a text, you need to generate a question about the main subject or key information in the text.",
"generated_question": "Who is the Eiffel Tower named after?",
"generated_answer": "The Eiffel Tower is named after the engineer Gustave Eiffel, whose company was responsible for its design and construction."
}
