PairQualSampleEvaluator

About 361 wordsAbout 1 min

2025-10-09

📘 Overview

PairQualSampleEvaluator is a text quality scoring operator based on the BGE model, trained on GPT pairwise comparison data, supporting both English and Chinese.

`init` function

def __init__(self, model_cache_dir:str='./dataflow_cache', device="cuda", lang='en', max_length=512)

Parameter	Type	Default Value	Description
model_cache_dir	str	'./dataflow_cache'	The directory to cache the downloaded model.
device	str	"cuda"	The device to run the model on (e.g., "cuda" or "cpu").
lang	str	'en'	The language of the model to use. Supports 'en' or 'zh'.
max_length	int	512	The maximum sequence length for the tokenizer.

Prompt Template Descriptions

Prompt Template Name	Primary Use	Applicable Scenarios	Feature Description

`run` function

def run(self, storage: DataFlowStorage, input_key: str, output_key: str='PairQualScore')

Parameter	Type	Default Value	Description
storage	DataFlowStorage	Required	The data flow storage instance for reading and writing data.
input_key	str	Required	The name of the input column containing the text to be evaluated.
output_key	str	'PairQualScore'	The name of the output column where the quality scores will be stored.

🧠 Example Usage

from dataflow.operators.text_pt.eval import PairQualSampleEvaluator
from dataflow.utils.storage import FileStorage

# Prepare data and storage
storage = FileStorage(first_entry_file_name="pt_input.jsonl")

# Initialize and run the operator
pairqual_evaluator = PairQualSampleEvaluator(lang='en')
pairqual_evaluator.run(
    storage.step(),
    input_key='raw_content',
    output_key='PairQualScore'
)

🧾 Default Output Format

Field	Type	Description
...	...	Original columns from the input data.
PairQualScore	float	The calculated quality score.

Example Input:

{
  "raw_content": "AMICUS ANTHOLOGIES, PART ONE (1965-1972)\nFebruary 23, 2017 Alfred Eaker Leave a comment\nWith Dr. Terror's House of Horrors (1965, directed by Freddie Francis and written by Milton Subotsky) Amicus Productions (spearheaded by Subotsky and Max Rosenberg, who previously produced for Hammer and was a cousin to Doris Wishman) established itself as a vital competitor to Hammer Studios..."
}

Example Output:

{
  "raw_content": "AMICUS ANTHOLOGIES, PART ONE (1965-1972)\nFebruary 23, 2017 Alfred Eaker Leave a comment\nWith Dr. Terror's House of Horrors (1965, directed by Freddie Francis and written by Milton Subotsky) Amicus Productions (spearheaded by Subotsky and Max Rosenberg, who previously produced for Hammer and was a cousin to Doris Wishman) established itself as a vital competitor to Hammer Studios...",
  "PairQualScore": 3.2509903908
}

eval

generate

eval

generate

eval

filter

generate

eval

filter

generate

generate

eval

filter

refine

generate

generate

generate

eval

filter

refine

generate

generate

eval

filter

generate

eval

filter

generate

eval

generate

filter

eval

filter

generate

refine

PairQualSampleEvaluator

📘 Overview

__init__ function

Prompt Template Descriptions

run function

🧠 Example Usage

🧾 Default Output Format

`init` function

`run` function