RMSampleEvaluator

660 字约 2 分钟

2025-10-09

📘 概述 RMSampleEvaluator

基于人类偏好数据训练的奖励模型(OpenAssistant/reward-model-deberta-v3-large-v2)对文本质量进行打分，高分代表质量较高。模型输入为指令和响应文本对，输出0-1之间的奖励分数，反映人类对文本质量的偏好判断。

init函数

def __init__(self, device='cuda', model_cache_dir='./dataflow_cache')

init参数说明

参数名	类型	默认值	说明
device	str	'cuda'	模型运行设备，例如 'cuda' 或 'cpu'。
model_cache_dir	str	'./dataflow_cache'	用于缓存下载的模型的目录。

Prompt模板说明

Prompt 模板名称	主要用途	适用场景	特点说明

run函数

def run(self, storage: DataFlowStorage, input_instruction_key: str = 'instruction', input_output_key: str = 'output', output_key: str = 'RMScore')

参数

名称	类型	默认值	说明
storage	DataFlowStorage	必需	数据流存储实例，负责读取与写入数据。
input_instruction_key	str	"instruction"	输入的指令文本对应的列名。
input_output_key	str	"output"	输入的响应文本对应的列名。
output_key	str	"RMScore"	输出的奖励模型分数对应的列名。

🧠 示例用法

from dataflow.operators.text_sft.eval import RMSampleEvaluator
from dataflow.utils.storage import FileStorage

# 准备包含指令-输出对的存储
storage = FileStorage(first_entry_file_name="sft_data.jsonl")

# 初始化并运行评估器
evaluator = RMSampleEvaluator(
    device="cuda",
    model_cache_dir="./dataflow_cache",
)
evaluator.run(
    storage.step(),
    input_instruction_key="instruction",
    input_output_key="output",
    output_key="RMScore",
)

🧾 默认输出格式（Output Format）

字段	类型	说明
instruction	str	输入的指令文本。
output	str	输入的响应文本。
RMScore	float	模型生成的奖励分数，越高表示质量越好。

示例输入：

{
  "instruction": "Can you provide a list of healthy habits to maintain a healthy lifestyle? Please format your response as an HTML page with bullet points.",
  "output": "Here's an HTML page with bullet points for healthy habits:\n<html>\n  <body>\n    <h3>Healthy Habits:</h3>\n    <ul>\n      <li>Eating a balanced diet with plenty of fruits and vegetables.</li>\n      <li>Engaging in regular physical activity, such as walking, running, or cycling.</li>\n      <li>Getting enough sleep each night, ideally 7-8 hours.</li>\n      <li>Staying hydrated by drinking plenty of water throughout the day.</li>\n      <li>Limiting alcohol consumption and avoiding smoking.</li>\n      <li>Managing stress through relaxation techniques like meditation or yoga.</li>\n      <li>Regularly visiting a healthcare provider for check-ups and preventative care.</li>\n    </ul>\n  </body>\n</html>"
}

示例输出：

{
  "instruction": "Can you provide a list of healthy habits to maintain a healthy lifestyle? Please format your response as an HTML page with bullet points.",
  "output": "Here's an HTML page with bullet points for healthy habits:\n<html>\n  <body>\n    <h3>Healthy Habits:</h3>\n    <ul>\n      <li>Eating a balanced diet with plenty of fruits and vegetables.</li>\n      <li>Engaging in regular physical activity, such as walking, running, or cycling.</li>\n      <li>Getting enough sleep each night, ideally 7-8 hours.</li>\n      <li>Staying hydrated by drinking plenty of water throughout the day.</li>\n      <li>Limiting alcohol consumption and avoiding smoking.</li>\n      <li>Managing stress through relaxation techniques like meditation or yoga.</li>\n      <li>Regularly visiting a healthcare provider for check-ups and preventative care.</li>\n    </ul>\n  </body>\n</html>",
  "RMScore": 5.2253570557
}

eval

generate

eval

generate

eval

filter

generate

eval

filter

generate

generate

eval

filter

generate

refine

generate

generate

eval

filter

refine

generate

generate

eval

filter

generate

eval

filter

generate

eval

filter

generate

eval

filter

generate

refine

RMSampleEvaluator