PerspectiveSampleEvaluator

About 470 wordsAbout 2 min

2025-10-09

📘 Overview

PerspectiveSampleEvaluator is an operator for detecting toxic content in text using Google's Perspective API. It evaluates text for toxicity, profanity, personal attacks, and other harmful content, returning a toxicity score between 0-1.

Key Features:

Detects multiple types of toxic content (toxicity, severe toxicity, obscenity, threats, insults, identity attacks, etc.)
Returns normalized toxicity probability scores (0-1)
Supports multiple languages
Powered by Google's state-of-the-art toxic content detection models

Use Cases: Content moderation, social media monitoring, comment filtering, UGC (User Generated Content) quality control

init

def __init__(self)

This operator requires no parameters during initialization but requires Google Perspective API Key configuration.

API Key Configuration

To use this operator, you need to:

Obtain a Perspective API Key from Google Perspective API
Configure the API Key via PerspectiveAPIServing

run

def run(self, storage: DataFlowStorage, input_key: str, output_key: str = 'PerspectiveScore')

Parameters

Name	Type	Default	Description
storage	DataFlowStorage	Required	DataFlow storage instance for reading and writing data.
input_key	str	Required	Input column name specifying the column containing the text to be evaluated.
output_key	str	'PerspectiveScore'	Output column name for storing the toxicity score.

🧠 Example Usage

from dataflow.operators.general_text import PerspectiveSampleEvaluator
from dataflow.utils.storage import FileStorage
from dataflow.serving import PerspectiveAPIServing

class PerspectiveSampleEvaluatorTest():
    def __init__(self):
        self.storage = FileStorage(
            first_entry_file_name="./dataflow/example/GeneralTextPipeline/perspective_test_input.jsonl",
            cache_path="./cache",
            file_name_prefix="dataflow_cache_step",
            cache_type="jsonl",
        )
        
        self.api_serving = PerspectiveAPIServing(api_key="YOUR_API_KEY")
        
        self.evaluator = PerspectiveSampleEvaluator()
        
    def forward(self):
        self.evaluator.run(
            storage=self.storage.step(),
            input_key='text',
            output_key='PerspectiveScore'
        )

if __name__ == "__main__":
    test = PerspectiveSampleEvaluatorTest()
    test.forward()

🧾 Default Output Format

Field	Type	Description
text	str	The original input text
PerspectiveScore	float	Toxicity score (0-1, higher indicates more toxic content)

📋 Example Input

{"text": "Thank you for your help! I really appreciate your kindness and support."}
{"text": "I disagree with your opinion, but I respect your perspective on this matter."}
{"text": "You are an idiot and your ideas are completely stupid and worthless."}

📤 Example Output

{"text": "Thank you for your help! I really appreciate your kindness and support.", "PerspectiveScore": 0.012}
{"text": "I disagree with your opinion, but I respect your perspective on this matter.", "PerspectiveScore": 0.089}
{"text": "You are an idiot and your ideas are completely stupid and worthless.", "PerspectiveScore": 0.952}

📊 Result Analysis

Sample 1 (Friendly Text):

PerspectiveScore: 0.012 (very low toxicity)
Positive and appreciative language
Safe for publication

Sample 2 (Neutral Text):

PerspectiveScore: 0.089 (low toxicity)
Respectful disagreement without offensive language
Acceptable content

Sample 3 (Offensive Text):

PerspectiveScore: 0.952 (very high toxicity)
Contains personal attacks and derogatory language
Should be filtered or flagged for moderation

Score Interpretation:

0.0-0.3: Low toxicity, generally safe content
0.3-0.7: Medium toxicity, may require review
0.7-1.0: High toxicity, should be filtered or flagged

Application Value: Particularly useful for automated content moderation systems, helping identify and filter toxic content before publication, protecting community health and user experience.

eval

generate

eval

generate

eval

filter

generate

eval

filter

generate

generate

eval

filter

refine

generate

generate

generate

eval

filter

refine

generate

generate

eval

filter

generate

eval

filter

generate

eval

generate

filter

eval

filter

generate

refine

PerspectiveSampleEvaluator