ReasoningAnswerGroundTruthFilter

About 248 wordsLess than 1 minute

2025-10-09

ReasoningAnswerGroundTruthFilter is an operator designed to filter data by comparing a generated answer against a ground truth answer. It supports both exact string matching and mathematical equivalence verification. Rows where the answers match are retained, while non-matching rows are discarded.

`init`

def __init__(self, compare_method: Literal["math_verify", "exact"] = "math_verify"):

Parameter Name	Type	Default Value	Description
compare_method	str	"math_verify"	The comparison method. `"math_verify"` checks for mathematical equivalence, while `"exact"` performs a strict string comparison.

`run`

def run(self, storage: DataFlowStorage, input_test_answer_key: str = "generated_cot", input_gt_answer_key: str = "golden_answer"):

Parameter Name	Type	Default Value	Description
storage	DataFlowStorage	Required	The data flow storage instance for reading input data and writing the filtered output.
input_test_answer_key	str	"generated_cot"	The column name in the input data that contains the generated answer to be verified.
input_gt_answer_key	str	"golden_answer"	The column name in the input data that contains the ground truth answer for comparison.

🧠 Example Usage

🧾 Output Format

The operator filters the input DataFrame, keeping only the rows where the answer in input_test_answer_key matches the answer in input_gt_answer_key according to the specified compare_method. The output schema is identical to the input schema.

Example Input:

[
    {
        "instruction": "What is 15 plus 20?",
        "generated_cot": "The sum is 35.",
        "golden_answer": "35"
    },
    {
        "instruction": "What is the capital of France?",
        "generated_cot": "The capital is Lyon.",
        "golden_answer": "Paris"
    }
]

Example Output (with compare_method='math_verify'):

[
    {
        "instruction": "What is 15 plus 20?",
        "generated_cot": "The sum is 35.",
        "golden_answer": "35"
    }
]

eval

generate

eval

generate

eval

filter

generate

eval

filter

generate

generate

eval

filter

refine

generate

generate

generate

eval

filter

refine

generate

generate

eval

filter

generate

eval

filter

generate

eval

generate

filter

eval

filter

generate

refine

ReasoningAnswerGroundTruthFilter

__init__

run

🧠 Example Usage

🧾 Output Format

`init`

`run`