RMFilter
About 497 wordsAbout 2 min
2025-10-09
📘 Overview
The RMFilter is an operator designed to filter data based on scores generated by a Reward Model (RM). It uses a pre-trained reward model, RMSampleEvaluator, to evaluate the quality of text samples (e.g., instruction-response pairs) and retains only those samples whose scores fall within a specified range [min_score, max_score]. This is useful for curating high-quality datasets by filtering out undesirable or low-quality responses.
__init__
def __init__(self, min_score: float = 0.2, max_score: float = 0.8, device='cuda', model_cache_dir='./dataflow_cache'):| Parameter | Type | Default Value | Description |
|---|---|---|---|
| min_score | float | 0.2 | The minimum reward score threshold for retaining a sample. |
| max_score | float | 0.8 | The maximum reward score threshold for retaining a sample. |
| device | str | 'cuda' | The device on which to run the reward model (e.g., 'cuda', 'cpu'). |
| model_cache_dir | str | './dataflow_cache' | The directory to cache the downloaded reward model files. |
Prompt Template Descriptions
run
def run(self, storage: DataFlowStorage, input_instruction_key: str = 'instruction', input_output_key: str = 'output', output_key: str = 'RMScore'):| Parameter | Type | Default Value | Description |
|---|---|---|---|
| storage | DataFlowStorage | Required | The data flow storage instance used for reading and writing data. |
| input_instruction_key | str | "instruction" | The column name in the input data that contains the instruction text. |
| input_output_key | str | "output" | The column name in the input data that contains the response text to be scored. |
| output_key | str | "RMScore" | The column name where the generated reward score will be stored. |
🧠 Example Usage
from dataflow.operators.text_sft.filter import RMFilter
from dataflow.utils.storage import FileStorage
# Prepare storage with instruction-output pairs
storage = FileStorage(first_entry_file_name="sft_data.jsonl")
# Initialize and run the filter
rm_filter = RMFilter(
min_score=0.2,
max_score=0.8,
device="cuda",
model_cache_dir="./dataflow_cache",
)
rm_filter.run(
storage.step(),
input_instruction_key="instruction",
input_output_key="output",
output_key="RMScore",
)🧾 Default Output Format
The operator adds a new column (defaulting to RMScore) to the existing data. Only rows where the score is between min_score and max_score are kept.
| Field | Type | Description |
|---|---|---|
| instruction | str | The input instruction text. |
| output | str | The input response text. |
| RMScore | float | The score assigned by the reward model. |
Example Input:
{
"instruction": "How can we use Python to calculate the GCD (greatest common divisor) of five numbers and express each number in terms of the GCD?",
"output": "Yes, that's correct! The function you've provided takes in five numbers as arguments and returns the GCD of those numbers along with each number expressed in terms of the GCD. This is a useful tool for simplifying fractions or finding the common factor between multiple numbers. Great job!"
}Example Output (if it passes the filter):
{
"instruction": "How can we use Python to calculate the GCD (greatest common divisor) of five numbers and express each number in terms of the GCD?",
"output": "Yes, that's correct! The function you've provided takes in five numbers as arguments and returns the GCD of those numbers along with each number expressed in terms of the GCD. This is a useful tool for simplifying fractions or finding the common factor between multiple numbers. Great job!",
"RMScore": 0.7027474046
}
