ReasoningQuestionGenerator
About 498 wordsAbout 2 min
2025-10-09
📘 Overview
ReasoningQuestionGenerator is a question generation operator designed to create new questions based on existing ones. It utilizes a Large Language Model (LLM) to synthesize a specified number of new, diverse questions from each input question, leveraging different prompt templates (Math, General, DIY) to control the generation style.
__init__
def __init__(self,
num_prompts: int = 1,
llm_serving: LLMServingABC = None,
prompt_template = MathQuestionSynthesisPrompt | GeneralQuestionSynthesisPrompt | DiyQuestionSynthesisPrompt | DIYPromptABC
):init Parameter Descriptions
| Parameter | Type | Default Value | Description |
|---|---|---|---|
| num_prompts | int | 1 | The number of new questions to generate for each input question. Must be an integer between 1 and 5 (inclusive). |
| llm_serving | LLMServingABC | Required | The Large Language Model serving instance used for question generation. |
| prompt_template | PromptABC | MathQuestionSynthesisPrompt | The prompt template object used to construct the generation prompts. Supports Math, General, and custom DIY templates. |
Prompt Template Descriptions
| Prompt Template Name | Primary Purpose | Applicable Scenarios | Feature Description |
|---|---|---|---|
| MathQuestionSynthesisPrompt | Math question generation | Math-related question augmentation | Generates new questions based on mathematical formulas and theorems, supports single-step and multi-step calculation problems. |
| GeneralQuestionSynthesisPrompt | General question generation | General knowledge question augmentation | Generates new questions based on general knowledge, not dependent on specific domain knowledge. |
| DiyQuestionSynthesisPrompt | Custom question generation | Custom question augmentation | Generates new questions based on user-defined question templates. |
run
def run(self,
storage: DataFlowStorage,
input_key: str,
output_synth_or_input_flag: str = "Synth_or_Input"
):Executes the main logic of the operator. It reads an input DataFrame from storage, generates new questions based on the input_key column, and writes the augmented DataFrame (containing both original and new questions) back to storage.
Parameters
| Name | Type | Default Value | Description |
|---|---|---|---|
| storage | DataFlowStorage | Required | The DataFlowStorage instance, responsible for reading input data and writing the output. |
| input_key | str | Required | The name of the input column that contains the original questions. |
| output_synth_or_input_flag | str | "Synth_or_Input" | The name of the output column that flags whether a row is an original input ('input') or a newly synthesized question ('synth'). |
🧠 Example Usage
from dataflow.operators.reasoning import ReasoningQuestionGenerator
from dataflow.utils.storage import FileStorage
from dataflow.core import LLMServingABC
from dataflow.serving import APILLMServing_request
from dataflow.prompts.reasoning.math import MathQuestionSynthesisPrompt
class ReasoningQuestionGeneratorTest():
def __init__(self, llm_serving: LLMServingABC = None):
self.storage = FileStorage(
first_entry_file_name="example.json",
cache_path="./cache_local",
file_name_prefix="dataflow_cache_step",
cache_type="jsonl",
)
# use API server as LLM serving
self.llm_serving = APILLMServing_request(
api_url="",
model_name="gpt-4o",
max_workers=30
)
self.operator = ReasoningQuestionGenerator(
llm_serving = self.llm_serving,
prompt_template = MathQuestionSynthesisPrompt()
)
def forward(self):
self.operator.run(
storage = self.storage.step(),
input_key = "instruction",
output_synth_or_input_flag = "Synth_or_Input"
)
if __name__ == "__main__":
pl = ReasoningQuestionGeneratorTest()
pl.forward()🧾 Default Output Format
The operator adds new rows for each generated question and a new column to distinguish between original and synthesized data.
| Field | Type | Description |
|---|---|---|
| {input_key} | str | Contains both the original and the newly generated questions. |
| Synth_or_Input | str | Flags the source of the question: 'input' for original questions, 'synth' for generated ones. |
Example Input:
{
"instruction": "What is the capital of France?"
}Example Output (with num_prompts=1):
[
{
"instruction": "What is the capital of France?",
"Synth_or_Input": "input"
},
{
"instruction": "Identify the primary city that serves as the governmental seat of the French Republic.",
"Synth_or_Input": "synth"
}
]
