RMSampleEvaluator

About 543 wordsAbout 2 min

2025-10-09

📘 Overview

The RMSampleEvaluator is an operator that scores text quality using a reward model (OpenAssistant/reward-model-deberta-v3-large-v2) trained on human preference data. It takes an instruction-response pair as input and outputs a reward score, where a higher score indicates better quality. This is useful for evaluating the quality of generated text in a human-aligned way.

`init`

def __init__(self, device='cuda', model_cache_dir='./dataflow_cache')

Parameter	Type	Default	Description
device	str	`'cuda'`	The device to run the model on (e.g., 'cuda', 'cpu').
model_cache_dir	str	`'./dataflow_cache'`	The directory to cache the downloaded Hugging Face model.

`run`

def run(self, storage: DataFlowStorage, input_instruction_key: str = 'instruction', input_output_key: str = 'output', output_key: str = 'RMScore')

Parameter	Type	Default	Description
storage	DataFlowStorage	Required	The DataFlowStorage instance for reading and writing data.
input_instruction_key	str	`'instruction'`	The column name in the input dataframe for the instruction text.
input_output_key	str	`'output'`	The column name in the input dataframe for the response text.
output_key	str	`'RMScore'`	The column name in the output dataframe for the generated reward score.

📝 Prompt Template Descriptions

🧠 Example Usage

from dataflow.operators.text_sft.eval import RMSampleEvaluator
from dataflow.utils.storage import FileStorage

# Prepare storage with instruction-output pairs
storage = FileStorage(first_entry_file_name="sft_data.jsonl")

# Initialize and run the evaluator
evaluator = RMSampleEvaluator(
    device="cuda",
    model_cache_dir="./dataflow_cache",
)
evaluator.run(
    storage.step(),
    input_instruction_key="instruction",
    input_output_key="output",
    output_key="RMScore",
)

🧾 Default Output Format

Field	Type	Description
instruction	str	The input instruction text.
output	str	The input response text.
RMScore	float	The reward model score generated by the model.

Example Input:

{
  "instruction": "Can you provide a list of healthy habits to maintain a healthy lifestyle? Please format your response as an HTML page with bullet points.",
  "output": "Here's an HTML page with bullet points for healthy habits:\n<html>\n  <body>\n    <h3>Healthy Habits:</h3>\n    <ul>\n      <li>Eating a balanced diet with plenty of fruits and vegetables.</li>\n      <li>Engaging in regular physical activity, such as walking, running, or cycling.</li>\n      <li>Getting enough sleep each night, ideally 7-8 hours.</li>\n      <li>Staying hydrated by drinking plenty of water throughout the day.</li>\n      <li>Limiting alcohol consumption and avoiding smoking.</li>\n      <li>Managing stress through relaxation techniques like meditation or yoga.</li>\n      <li>Regularly visiting a healthcare provider for check-ups and preventative care.</li>\n    </ul>\n  </body>\n</html>"
}

Example Output:

{
  "instruction": "Can you provide a list of healthy habits to maintain a healthy lifestyle? Please format your response as an HTML page with bullet points.",
  "output": "Here's an HTML page with bullet points for healthy habits:\n<html>\n  <body>\n    <h3>Healthy Habits:</h3>\n    <ul>\n      <li>Eating a balanced diet with plenty of fruits and vegetables.</li>\n      <li>Engaging in regular physical activity, such as walking, running, or cycling.</li>\n      <li>Getting enough sleep each night, ideally 7-8 hours.</li>\n      <li>Staying hydrated by drinking plenty of water throughout the day.</li>\n      <li>Limiting alcohol consumption and avoiding smoking.</li>\n      <li>Managing stress through relaxation techniques like meditation or yoga.</li>\n      <li>Regularly visiting a healthcare provider for check-ups and preventative care.</li>\n    </ul>\n  </body>\n</html>",
  "RMScore": 5.2253570557
}

eval

generate

eval

generate

eval

filter

generate

eval

filter

generate

generate

eval

filter

refine

generate

generate

generate

eval

filter

refine

generate

generate

eval

filter

generate

eval

filter

generate

eval

generate

filter

eval

filter

generate

refine

RMSampleEvaluator

📘 Overview

__init__

run

📝 Prompt Template Descriptions

🧠 Example Usage

🧾 Default Output Format

`init`

`run`