Function Call Data Synthesis Operators
About 837 wordsAbout 3 min
2025-07-20
Overview
Function call data synthesis operators are designed to synthesize structured function call data from dialogues or real-world task descriptions. These operators cover scenario extraction and expansion, task generation and validation, function generation, and multi-agent multi-turn conversation generation.
All related operators are located in dataflow/operators/conversations/func_call_operators.py. The table below summarizes their applicable scenarios:
Name | Type | Description | Repo or Paper |
---|---|---|---|
ScenarioExtractor | Scenario Extraction | Extracts scenario descriptions from conversations using LLM. | Data Paper |
ScenarioExpander | Scenario Expansion | Generates alternative scenarios based on original ones using LLM. | |
AtomTaskGenerator | Task Generation | Generates atomic tasks from scenario descriptions using LLM. | |
SequentialTaskGenerator | Task Generation | Generates subsequent tasks and composes them into sequential tasks. | |
ParaSeqTaskGenerator | Task Generation | Generates parallel and subsequent tasks and combines them with the original task. | |
CompositionTaskFilter | Task Filtering | Validates compositional tasks and filters out incomplete ones using LLM. | |
FunctionGenerator | Function Generation | Generates function definitions for a given task composition and its subtasks. | |
MultiTurnConversationGenerator | Dialogue Generation | Generates multi-turn conversations with User, Assistant, and Tool agents based on tasks and functions. |
Operator Details
1. ScenarioExtractor ✨
Description:
Extracts concise task scenario descriptions from dialogue using an LLM.
Parameters:
__init__()
llm_serving
: LLM interface instance
run()
storage
: data storage interfaceinput_chat_key
: field name for conversation inputoutput_key
: output field name (default:"scenario"
)
Highlights:
- Strong contextual understanding
- Forms basis for downstream task generation
- Supports batch processing
2. ScenarioExpander ✨
Description:
Expands extracted task scenarios to generate varied alternatives via LLM.
Parameters:
__init__()
llm_serving
: LLM interface instance
run()
storage
: data storage interfaceinput_scenario_key
: field name of original scenariooutput_key
: output field name (default:"modified_scenario"
)
Highlights:
- Enhances scenario diversity
- Useful for data augmentation
3. AtomTaskGenerator ✨
Description:
Generates fine-grained atomic tasks from a given scenario.
Parameters:
__init__()
llm_serving
: LLM interface instance
run()
storage
: data storage interfaceinput_scenario_key
: field name for scenario inputoutput_key
: output field name (default:"atom_task"
)
Highlights:
- Atomic-level task granularity
- Task decomposition from scenario
4. SequentialTaskGenerator ✨
Description:
Creates follow-up tasks and combines them with atomic tasks into a sequential flow.
Parameters:
__init__()
llm_serving
: LLM interface instance
run()
storage
: data storage interfaceinput_task_key
: field name for atomic taskoutput_subsequent_task_key
: subsequent task field (default:"subsequent_task"
)output_composition_task_key
: composed task field (default:"composition_task"
)
Highlights:
- Supports multi-step task flow generation
- Clear structure and traceability
5. ParaSeqTaskGenerator ✨
Description:
Generates parallel and sequential extensions for an atomic task and composes them into a complex task.
Parameters:
__init__()
llm_serving
: LLM interface instance
run()
storage
: data storage interfaceinput_task_key
: atomic task fieldoutput_parallel_task_key
: parallel task field (default:"parallel_task"
)output_subsequent_task_key
: subsequent task field (default:"subsequent_task"
)output_composition_task_key
: composed task field (default:"composition_task"
)
Highlights:
- Multi-dimensional task modeling
- Captures concurrency and sequencing
6. CompositionTaskFilter ✨
Description:
Validates if a composed task is logically complete and executable. Filters invalid or incoherent compositions.
Parameters:
__init__()
llm_serving
: LLM interface instance
run()
storage
: data storage interfaceinput_composition_task_key
: composed task fieldinput_sub_tasks_keys
: list of subtask field namesoutput_key
: label field for executability (default:"runable_label"
)
Highlights:
- Logical and semantic validation
- Filters for downstream function generation
7. FunctionGenerator ✨
Description:
Generates structured function call specifications (name, parameters, doc) for a composed task and its subtasks.
Parameters:
__init__()
llm_serving
: LLM interface instance
run()
storage
: data storage interfaceinput_composition_task_key
: composed task fieldinput_sub_tasks_keys
: subtask field namesoutput_key
: output field for functions (default:"functions"
)
Highlights:
- LLM-based function synthesis
- Designed for tool/agent integration
- Structured JSON-like output
8. MultiTurnConversationGenerator ✨🚀
Description:
Simulates multi-turn conversations involving User, Assistant, and Tool agents to complete the composed task via function calls.
Parameters:
__init__()
llm_serving
: LLM interface instance
run()
storage
: data storage interfaceinput_task_key
: composed task fieldinput_sub_tasks_keys
: list of subtask fieldsinput_functions_key
: field name for function listoutput_conversations_key
: output field for conversations (default:"conversations"
)
Highlights:
- Multi-agent interactive generation
- Supports function call injection
- Up to 5 full interaction rounds
For code examples, refer to the Function Call Data Synthesis Pipeline or the GitHub source file.