SFTGeneratorSeed
About 439 wordsAbout 1 min
2025-10-09
📘 Overview
SFTGeneratorSeed is a supervised fine-tuning (SFT) data generation operator. Based on given document content, it calls a large language model (LLM) to generate instruction-response data pairs in supervised fine-tuning format. The operator supports users to control the specific requirements of generated content through custom prompts, enabling automated construction of high-quality SFT datasets from raw documents.
__init__
def __init__(self, llm_serving: LLMServingABC, custom_prompt: str)init Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| llm_serving | LLMServingABC | Required | The Large Language Model serving instance, used to execute inference and generation. |
| custom_prompt | str | Required | User-defined prompt used to guide the model to generate question-answer pairs of specific style or content. |
Prompt Template Descriptions
| Prompt Template Name | Primary Use | Applicable Scenarios | Feature Description |
|---|---|---|---|
run
def run(self, storage: DataFlowStorage, input_key: str = "raw_content")Executes the main operator logic, reads a DataFrame containing raw documents from storage, generates instructions and answers for each document content, and writes the results back to storage.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| storage | DataFlowStorage | None | The DataFlowStorage instance, responsible for reading and writing data. |
| input_key | str | "raw_content" | Input column name, corresponding to the field containing raw document content. |
🧠 Example Usage
🧾 Default Output Format
| Field | Type | Description |
|---|---|---|
| instruction | str | The instruction or question generated by the model based on the original document. |
| output | str | The answer given by the model for the generated instruction. |
| raw_content | str | The original document content used to generate the question-answer pair. |
Example Input:
{
"raw_content": "Penicillin, also known as penicillium, is a type of antibiotic that contains penicillanic acid in its molecular structure, which can destroy the cell walls of bacteria and act as a bactericide during the bacterial cell reproduction period. In 1928, British bacteriologist Alexander Fleming discovered in the laboratory that a type of mold (penicillium) could secrete a substance to inhibit the growth of staphylococcus. He named this substance penicillin."
}Example Output:
{
"instruction": "Who discovered penicillin in 1928, and what is its main function?",
"output": "Penicillin was discovered by British bacteriologist Alexander Fleming in 1928. Its main function is to destroy the cell walls of bacteria and act as a bactericide during the bacterial reproduction period.",
"raw_content": "Penicillin, also known as penicillium, is a type of antibiotic that contains penicillanic acid in its molecular structure, which can destroy the cell walls of bacteria and act as a bactericide during the bacterial cell reproduction period. In 1928, British bacteriologist Alexander Fleming discovered in the laboratory that a type of mold (penicillium) could secrete a substance to inhibit the growth of staphylococcus. He named this substance penicillin."
}
