CodeQualityScoreFilter
About 442 wordsAbout 1 min
2025-10-09
📘 Overview
The CodeQualityScoreFilter is an operator designed to filter code samples based on quality scores generated by a Large Language Model (LLM). It leverages the CodeQualitySampleEvaluator to assess code on multiple dimensions, including correctness, completeness, clarity, adherence to best practices, and efficiency. The primary function of this operator is to automatically remove low-quality code samples from a dataset, retaining only those that meet a specified score threshold.
This filter is useful for:
- Removing code with syntax errors or logical issues.
- Discarding incomplete or poorly structured code snippets.
- Filtering out code that does not follow standard best practices.
- Curating a high-quality dataset by keeping samples with scores within a specified range.
__init__ function
def __init__(self, llm_serving: LLMServingABC, min_score: int = 7, max_score: int = 10):| Parameter | Type | Default | Description |
|---|---|---|---|
| llm_serving | LLMServingABC | Required | The LLM serving instance used for code quality evaluation. |
| min_score | int | 7 | The minimum quality score for a code sample to be kept. |
| max_score | int | 10 | The maximum quality score for a code sample to be kept. |
Prompt Template Descriptions
| Prompt Template Name | Primary Purpose | Applicable Scenarios | Feature Description |
|---|---|---|---|
run function
def run(self, storage: DataFlowStorage, input_key: str, output_key: str = 'quality_score_filter_label'):| Parameter | Type | Default | Description |
|---|---|---|---|
| storage | DataFlowStorage | Required | The DataFlow storage instance for reading the input DataFrame and writing the filtered output. |
| input_key | str | Required | The name of the input column that contains the code to be evaluated. |
| output_key | str | 'quality_score_filter_label' | The name for the new column that stores the filter result (1 for kept, 0 for discarded). |
🧠 Example Usage
🧾 Default Output Format
The operator filters the input DataFrame and adds the following columns if they do not already exist. The final output is a DataFrame containing only the rows that pass the filter.
| Field | Type | Description |
|---|---|---|
| (existing_columns) | - | All original columns from the input data are preserved. |
| quality_score | int | The quality score (from 0 to 10) assigned to the code by the LLM. |
| quality_feedback | str | The detailed feedback from the LLM explaining the score. |
output_key | int | A label indicating if the row passed the filter. A value of 1 means it passed and was kept. |
Example Input:
{
"instruction": "Write a python function to calculate the factorial of a number.",
"code": "def factorial(n): return 1 if n == 0 else n * factorial(n-1)"
}Example Output (within the filtered DataFrame):
{
"instruction": "Write a python function to calculate the factorial of a number.",
"code": "def factorial(n): return 1 if n == 0 else n * factorial(n-1)",
"quality_score": 9,
"quality_feedback": "The code is correct, concise, and follows best practices for a simple recursive solution.",
"quality_score_filter_label": 1
}
