ReasoningQuestionCategorySampleEvaluator

About 342 wordsAbout 1 min

2025-10-09

📘 Overview

The ReasoningQuestionCategorySampleEvaluator is an operator designed to perform hierarchical classification (primary and secondary categories) on user questions. It utilizes a Large Language Model (LLM) to conduct semantic analysis on the input question and outputs the corresponding category codes. This process helps in organizing and routing questions based on their subject matter.

`init` function

def __init__(self, llm_serving: LLMServingABC = None)

Parameter Name	Type	Default Value	Description
llm_serving	LLMServingABC	None	An instance of a large language model service, used for executing inference and generation.

Prompt Template Descriptions

Prompt Template Name	Main Purpose	Applicable Scenarios	Feature Description
MathQuestionCategoryPrompt	Multi-level question classification	Classifying user questions into primary and secondary categories	Takes input questions and outputs primary and secondary classifications

`run` function

def run(self, storage: DataFlowStorage, input_key:str = "instruction", output_key:str="question_category")

Name	Type	Default Value	Description
storage	DataFlowStorage	Required	The data flow storage instance, responsible for reading and writing data.
input_key	str	"instruction"	The name of the input column containing the question text.
output_key	str	"question_category"	The base name for the output classification results. Note: this key is used for validation to prevent overwriting existing columns.

🧠 Example Usage

from dataflow.operators.reasoning import ReasoningQuestionCategorySampleEvaluator
from dataflow.utils.storage import FileStorage
from dataflow.core import LLMServingABC
from dataflow.serving import APILLMServing_request

class ReasoningQuestionCategorySampleEvaluatorTest():
    def __init__(self, llm_serving: LLMServingABC = None):
        
        self.storage = FileStorage(
            first_entry_file_name="example.json",
            cache_path="./cache_local",
            file_name_prefix="dataflow_cache_step",
            cache_type="jsonl",
        )
        
        # use API server as LLM serving
        self.llm_serving = APILLMServing_request(
                    api_url="",
                    model_name="gpt-4o",
                    max_workers=30
        )
        
        self.evaluator = ReasoningQuestionCategorySampleEvaluator(llm_serving=self.llm_serving)
        
    def forward(self):
        self.evaluator.run(
            storage = self.storage.step(),
            input_key = "instruction",
            output_key = "category",
        )

if __name__ == "__main__":
    pl = ReasoningQuestionCategorySampleEvaluatorTest()
    pl.forward()

🧾 Default Output Format

Field	Type	Description
instruction	str	The input question text.
primary_category	str	The identified primary category of the question.
secondary_category	str	The identified secondary category of the question.

Example Input:

{
    "instruction": "How do you calculate the volume of a sphere with a radius of 5 units?"
}

Example Output:

{
    "instruction": "How do you calculate the volume of a sphere with a radius of 5 units?",
    "primary_category": "Geometry",
    "secondary_category": "Volume and Surface Area"
}

eval

generate

eval

generate

eval

filter

generate

eval

filter

generate

generate

eval

filter

refine

generate

generate

generate

eval

filter

refine

generate

generate

eval

filter

generate

eval

filter

generate

eval

generate

filter

eval

filter

generate

refine

ReasoningQuestionCategorySampleEvaluator

📘 Overview

__init__ function

Prompt Template Descriptions

run function

🧠 Example Usage

🧾 Default Output Format

`init` function

`run` function