ReasoningQuestionDifficultySampleEvaluator

About 268 wordsLess than 1 minute

2025-10-09

📘 Overview

ReasoningQuestionDifficultySampleEvaluator is a question difficulty evaluation operator. It analyzes the complexity of questions by calling a Large Language Model (LLM) and generates a difficulty score from 1 to 10 for each question.

`init` function

@prompt_restrict(
    MathQuestionDifficultyPrompt
)

@OPERATOR_REGISTRY.register()
class ReasoningQuestionDifficultySampleEvaluator(OperatorABC):
    def __init__(self, llm_serving: LLMServingABC = None):

init Parameter Description

Parameter Name	Type	Default	Description
llm_serving	LLMServingABC	Required	Large language model service instance for executing inference and generation.

Prompt Template Descriptions

Prompt Template Name	Primary Use	Applicable Scenarios	Features
MathQuestionDifficultyPrompt	Question difficulty evaluation	Evaluating the difficulty of user questions	Input question, output difficulty score from 1 to 10

run function

def run(self, storage: DataFlowStorage, input_key: str, output_key:str="difficulty_score")

Parameters

Name	Type	Default	Description
storage	DataFlowStorage	Required	DataFlow storage instance for reading and writing data.
input_key	str	Required	Input column name corresponding to the question field.
output_key	str	"difficulty_score"	Output column name corresponding to the generated difficulty score field.

🧠 Example Usage

from dataflow.operators.reasoning import ReasoningQuestionDifficultySampleEvaluator
from dataflow.utils.storage import FileStorage
from dataflow.core import LLMServingABC
from dataflow.serving import APILLMServing_request

class ReasoningQuestionDifficultySampleEvaluatorTest():
    def __init__(self, llm_serving: LLMServingABC = None):
        
        self.storage = FileStorage(
            first_entry_file_name="example.json",
            cache_path="./cache_local",
            file_name_prefix="dataflow_cache_step",
            cache_type="jsonl",
        )
        
        # use API server as LLM serving
        self.llm_serving = APILLMServing_request(
                    api_url="",
                    model_name="gpt-4o",
                    max_workers=30
        )
        
        self.evaluator = ReasoningQuestionDifficultySampleEvaluator(llm_serving=self.llm_serving)
        
    def forward(self):
        self.evaluator.run(
            storage = self.storage.step(),
            input_key = "instruction",
            output_key = "difficulty_score",
        )

if __name__ == "__main__":
    pl = ReasoningQuestionDifficultySampleEvaluatorTest()
    pl.forward()

🧾 Default Output Format

Field	Type	Description
difficulty_score	int	The difficulty score of the question, from 1 to 10.

Example input:

{
    "instruction": "Calculate 2 to the power of 5."
}

Example output:

{
    "difficulty_score": 3
}

eval

generate

eval

generate

eval

filter

generate

eval

filter

generate

generate

eval

filter

refine

generate

generate

generate

eval

filter

refine

generate

generate

eval

filter

generate

eval

filter

generate

eval

generate

filter

eval

filter

generate

refine

ReasoningQuestionDifficultySampleEvaluator