CodeInstructionGenerator

About 512 wordsAbout 2 min

2025-11-10

CodeInstructionGenerator is an operator that randomly samples few-shot examples from a data pool and uses a large language model (LLM) to generate instructions of similar difficulty. This serves as the first step in a 'self-instruct' style data synthesis pipeline for the code domain.

`init`

class CodeInstructionGenerator(OperatorABC):
    def __init__(self, llm_serving: LLMServingABC, prompt_template=None, num_few_shot: int = 3, num_generate: int = 10):

Parameter	Type	Default	Description
llm_serving	LLMServingABC	Required	Large language model serving instance for executing inference.
prompt_template	PromptABC / str	`CodeCodeToInstructionGeneratorPrompt()`	The prompt template object used to construct the input. Supports custom templates via string or `DiyCodePrompt`.
num_few_shot	int	3	The number of few-shot examples to sample.
num_generate	int	10	The number of similar instructions to generate.

Prompt Template Descriptions

Prompt Template Name	Primary Use	Applicable Scenarios	Feature Description
CodeInstructionGeneratePrompt	Generate new code instructions	Create new programming problems of similar style based on a few examples	Generates stylistically consistent instructions based on few-shot examples, maintaining similar difficulty and complexity, ensuring instructions are clear, specific, and solvable.

`run`

def run(self, storage: DataFlowStorage, input_key: str = "prompt", output_key: str = "generated_instruction")

Parameter	Type	Default	Description
storage	DataFlowStorage	Required	DataFlow storage instance for reading and writing data.
input_key	str	"prompt"	Input column name, corresponding to the example instruction field.。
output_key	str	"generated_instruction"	Output column name, corresponding to the generated instruction field.

🧠 Example Usage

🧾 Default Output Format

Field	Type	Description
prompt	str	The input instruction.
generated_instruction	str	The instruction generated by the model.

Example Input:

{"prompt": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n"}
{"prompt": "from typing import List\n\n\ndef separate_paren_groups(paren_string: str) -> List[str]:\n    \"\"\" Input to this function is a string containing multiple groups of nested parentheses. Your goal is to\n    separate those group into separate strings and return the list of those.\n    Separate groups are balanced (each open brace is properly closed) and not nested within each other\n    Ignore any spaces in the input string.\n    >>> separate_paren_groups('( ) (( )) (( )( ))')\n    ['()', '(())', '(()())']\n    \"\"\"\n"}
{"prompt": "\n\ndef truncate_number(number: float) -> float:\n    \"\"\" Given a positive floating point number, it can be decomposed into\n    and integer part (largest integer smaller than given number) and decimals\n    (leftover part always smaller than 1).\n\n    Return the decimal part of the number.\n    >>> truncate_number(3.5)\n    0.5\n    \"\"\"\n"}

Example Output:

{"generated_instruction": "from typing import List\n\n\ndef parse_nested_parens(paren_string: str) -> List[int]:\n    \"\"\" Input to this function is a string represented multiple groups for nested parentheses separated by spaces.\n    For each of the group, output the deepest level of nesting of parentheses.\n    E.g. (()()) has maximum two levels of nesting while ((())) has three.\n\n    >>> parse_nested_parens('(()()) ((())) () ((())()())')\n    [2, 3, 1, 3]\n    \"\"\"\n"}

eval

generate

eval

generate

eval

filter

generate

eval

filter

generate

generate

eval

filter

refine

generate

generate

generate

eval

filter

refine

generate

generate

eval

filter

generate

eval

filter

generate

eval

generate

filter

eval

filter

generate

refine

CodeInstructionGenerator

__init__

Prompt Template Descriptions

run

🧠 Example Usage

🧾 Default Output Format

`init`

`run`