RMSampleEvaluator
About 543 wordsAbout 2 min
2025-10-09
📘 Overview
The RMSampleEvaluator is an operator that scores text quality using a reward model (OpenAssistant/reward-model-deberta-v3-large-v2) trained on human preference data. It takes an instruction-response pair as input and outputs a reward score, where a higher score indicates better quality. This is useful for evaluating the quality of generated text in a human-aligned way.
__init__
def __init__(self, device='cuda', model_cache_dir='./dataflow_cache')| Parameter | Type | Default | Description |
|---|---|---|---|
| device | str | 'cuda' | The device to run the model on (e.g., 'cuda', 'cpu'). |
| model_cache_dir | str | './dataflow_cache' | The directory to cache the downloaded Hugging Face model. |
run
def run(self, storage: DataFlowStorage, input_instruction_key: str = 'instruction', input_output_key: str = 'output', output_key: str = 'RMScore')| Parameter | Type | Default | Description |
|---|---|---|---|
| storage | DataFlowStorage | Required | The DataFlowStorage instance for reading and writing data. |
| input_instruction_key | str | 'instruction' | The column name in the input dataframe for the instruction text. |
| input_output_key | str | 'output' | The column name in the input dataframe for the response text. |
| output_key | str | 'RMScore' | The column name in the output dataframe for the generated reward score. |
📝 Prompt Template Descriptions
🧠 Example Usage
from dataflow.operators.text_sft.eval import RMSampleEvaluator
from dataflow.utils.storage import FileStorage
# Prepare storage with instruction-output pairs
storage = FileStorage(first_entry_file_name="sft_data.jsonl")
# Initialize and run the evaluator
evaluator = RMSampleEvaluator(
device="cuda",
model_cache_dir="./dataflow_cache",
)
evaluator.run(
storage.step(),
input_instruction_key="instruction",
input_output_key="output",
output_key="RMScore",
)🧾 Default Output Format
| Field | Type | Description |
|---|---|---|
| instruction | str | The input instruction text. |
| output | str | The input response text. |
| RMScore | float | The reward model score generated by the model. |
Example Input:
{
"instruction": "Can you provide a list of healthy habits to maintain a healthy lifestyle? Please format your response as an HTML page with bullet points.",
"output": "Here's an HTML page with bullet points for healthy habits:\n<html>\n <body>\n <h3>Healthy Habits:</h3>\n <ul>\n <li>Eating a balanced diet with plenty of fruits and vegetables.</li>\n <li>Engaging in regular physical activity, such as walking, running, or cycling.</li>\n <li>Getting enough sleep each night, ideally 7-8 hours.</li>\n <li>Staying hydrated by drinking plenty of water throughout the day.</li>\n <li>Limiting alcohol consumption and avoiding smoking.</li>\n <li>Managing stress through relaxation techniques like meditation or yoga.</li>\n <li>Regularly visiting a healthcare provider for check-ups and preventative care.</li>\n </ul>\n </body>\n</html>"
}Example Output:
{
"instruction": "Can you provide a list of healthy habits to maintain a healthy lifestyle? Please format your response as an HTML page with bullet points.",
"output": "Here's an HTML page with bullet points for healthy habits:\n<html>\n <body>\n <h3>Healthy Habits:</h3>\n <ul>\n <li>Eating a balanced diet with plenty of fruits and vegetables.</li>\n <li>Engaging in regular physical activity, such as walking, running, or cycling.</li>\n <li>Getting enough sleep each night, ideally 7-8 hours.</li>\n <li>Staying hydrated by drinking plenty of water throughout the day.</li>\n <li>Limiting alcohol consumption and avoiding smoking.</li>\n <li>Managing stress through relaxation techniques like meditation or yoga.</li>\n <li>Regularly visiting a healthcare provider for check-ups and preventative care.</li>\n </ul>\n </body>\n</html>",
"RMScore": 5.2253570557
}
