ReasoningPseudoAnswerGenerator

527 字约 2 分钟

2025-10-09

📘 概述

ReasoningPseudoAnswerGenerator 是一个伪答案生成算子。它通过多次调用大语言模型（LLM）为同一个问题生成多个候选答案，然后通过投票（选择最频繁出现的答案）的方式确定最终答案。这种方法旨在提高答案的准确性和鲁棒性。

`init`函数

@prompt_restrict(
    MathAnswerGeneratorPrompt
)

@OPERATOR_REGISTRY.register()
class ReasoningPseudoAnswerGenerator(OperatorABC):
    '''
    Pseudo Answer Generator is a class that generates answers for given questions, then choose the most frequent answer.
    '''
    def __init__(self, llm_serving: LLMServingABC = None, max_times: int = 3):

init参数说明

参数名	类型	默认值	说明
llm_serving	LLMServingABC	None	大语言模型服务实例，用于执行推理与生成。
max_times	int	3	对每个问题重复生成答案的最大次数，用于投票选出最频繁的答案。

Prompt模板说明

Prompt 模板名称	主要用途	适用场景	特点说明
MathAnswerGeneratorPrompt	生成问题的答案	伪答案生成	输入问题，输出答案

run函数

def run(
    self,
    storage: DataFlowStorage,
    input_key: str = "instruction",
    output_key_answer: str = "pseudo_answers",
    output_key_answer_value: str = "pseudo_answer_value",
    output_key_solutions: str = "pseudo_solutions",
    output_key_correct_solution_example: str = "pseudo_correct_solution_example",
)

执行算子主逻辑，从存储中读取输入 DataFrame，多次调用 LLM 生成候选答案，通过投票选出最终答案，并将所有相关结果写回存储。

参数

名称	类型	默认值	说明
storage	DataFlowStorage	必需	数据流存储实例，负责读取与写入数据。
input_key	str	"instruction"	输入列名，对应问题字段。
output_key_answer	str	"pseudo_answers"	输出列名，对应多次生成的所有候选答案列表。
output_key_answer_value	str	"pseudo_answer_value"	输出列名，对应投票选出的最终答案。
output_key_solutions	str	"pseudo_solutions"	输出列名，对应所有生成最终答案的推理过程列表。
output_key_correct_solution_example	str	"pseudo_correct_solution_example"	输出列名，对应一个生成最终答案的推理过程示例。

🧠 示例用法

from dataflow.operators.reasoning import ReasoningPseudoAnswerGenerator
from dataflow.utils.storage import FileStorage
from dataflow.core import LLMServingABC
from dataflow.serving import APILLMServing_request

class ReasoningPseudoAnswerGeneratorTest():
    def __init__(self, llm_serving: LLMServingABC = None):
        
        self.storage = FileStorage(
            first_entry_file_name="example.json",
            cache_path="./cache_local",
            file_name_prefix="dataflow_cache_step",
            cache_type="jsonl",
        )
        
        # use API server as LLM serving
        self.llm_serving = APILLMServing_request(
                    api_url="",
                    model_name="gpt-4o",
                    max_workers=30
        )
        
        self.operator = ReasoningPseudoAnswerGenerator(
            llm_serving = self.llm_serving,
            max_times = 3
        )
        
    def forward(self):
        self.operator.run(
            storage = self.storage.step(),
            input_key = "instruction",
            output_key_answer = "pseudo_answers",
            output_key_answer_value = "pseudo_answer_value",
            output_key_solutions = "pseudo_solutions",
            output_key_correct_solution_example = "pseudo_correct_solution_example",
        )

if __name__ == "__main__":
    pl = ReasoningPseudoAnswerGeneratorTest()
    pl.forward()

eval

generate

eval

generate

eval

filter

generate

eval

filter

generate

generate

eval

filter

generate

refine

generate

generate

eval

filter

refine

generate

generate

eval

filter

generate

eval

filter

generate

eval

filter

generate

eval

filter

generate

refine

ReasoningPseudoAnswerGenerator

📘 概述

__init__函数

init参数说明

Prompt模板说明

run函数

参数

🧠 示例用法

`init`函数