Function Call Data Synthesis Operators
About 837 wordsAbout 3 min
2025-07-20
Overview
Function call data synthesis operators are designed to synthesize structured function call data from dialogues or real-world task descriptions. These operators cover scenario extraction and expansion, task generation and validation, function generation, and multi-agent multi-turn conversation generation.
All related operators are located in dataflow/operators/conversations/func_call_operators.py. The table below summarizes their applicable scenarios:
| Name | Type | Description | Repo or Paper |
|---|---|---|---|
| ScenarioExtractor | Scenario Extraction | Extracts scenario descriptions from conversations using LLM. | Data Paper |
| ScenarioExpander | Scenario Expansion | Generates alternative scenarios based on original ones using LLM. | |
| AtomTaskGenerator | Task Generation | Generates atomic tasks from scenario descriptions using LLM. | |
| SequentialTaskGenerator | Task Generation | Generates subsequent tasks and composes them into sequential tasks. | |
| ParaSeqTaskGenerator | Task Generation | Generates parallel and subsequent tasks and combines them with the original task. | |
| CompositionTaskFilter | Task Filtering | Validates compositional tasks and filters out incomplete ones using LLM. | |
| FunctionGenerator | Function Generation | Generates function definitions for a given task composition and its subtasks. | |
| MultiTurnConversationGenerator | Dialogue Generation | Generates multi-turn conversations with User, Assistant, and Tool agents based on tasks and functions. |
Operator Details
1. ScenarioExtractor ✨
Description:
Extracts concise task scenario descriptions from dialogue using an LLM.
Parameters:
__init__()llm_serving: LLM interface instance
run()storage: data storage interfaceinput_chat_key: field name for conversation inputoutput_key: output field name (default:"scenario")
Highlights:
- Strong contextual understanding
- Forms basis for downstream task generation
- Supports batch processing
2. ScenarioExpander ✨
Description:
Expands extracted task scenarios to generate varied alternatives via LLM.
Parameters:
__init__()llm_serving: LLM interface instance
run()storage: data storage interfaceinput_scenario_key: field name of original scenariooutput_key: output field name (default:"modified_scenario")
Highlights:
- Enhances scenario diversity
- Useful for data augmentation
3. AtomTaskGenerator ✨
Description:
Generates fine-grained atomic tasks from a given scenario.
Parameters:
__init__()llm_serving: LLM interface instance
run()storage: data storage interfaceinput_scenario_key: field name for scenario inputoutput_key: output field name (default:"atom_task")
Highlights:
- Atomic-level task granularity
- Task decomposition from scenario
4. SequentialTaskGenerator ✨
Description:
Creates follow-up tasks and combines them with atomic tasks into a sequential flow.
Parameters:
__init__()llm_serving: LLM interface instance
run()storage: data storage interfaceinput_task_key: field name for atomic taskoutput_subsequent_task_key: subsequent task field (default:"subsequent_task")output_composition_task_key: composed task field (default:"composition_task")
Highlights:
- Supports multi-step task flow generation
- Clear structure and traceability
5. ParaSeqTaskGenerator ✨
Description:
Generates parallel and sequential extensions for an atomic task and composes them into a complex task.
Parameters:
__init__()llm_serving: LLM interface instance
run()storage: data storage interfaceinput_task_key: atomic task fieldoutput_parallel_task_key: parallel task field (default:"parallel_task")output_subsequent_task_key: subsequent task field (default:"subsequent_task")output_composition_task_key: composed task field (default:"composition_task")
Highlights:
- Multi-dimensional task modeling
- Captures concurrency and sequencing
6. CompositionTaskFilter ✨
Description:
Validates if a composed task is logically complete and executable. Filters invalid or incoherent compositions.
Parameters:
__init__()llm_serving: LLM interface instance
run()storage: data storage interfaceinput_composition_task_key: composed task fieldinput_sub_tasks_keys: list of subtask field namesoutput_key: label field for executability (default:"runable_label")
Highlights:
- Logical and semantic validation
- Filters for downstream function generation
7. FunctionGenerator ✨
Description:
Generates structured function call specifications (name, parameters, doc) for a composed task and its subtasks.
Parameters:
__init__()llm_serving: LLM interface instance
run()storage: data storage interfaceinput_composition_task_key: composed task fieldinput_sub_tasks_keys: subtask field namesoutput_key: output field for functions (default:"functions")
Highlights:
- LLM-based function synthesis
- Designed for tool/agent integration
- Structured JSON-like output
8. MultiTurnConversationGenerator ✨🚀
Description:
Simulates multi-turn conversations involving User, Assistant, and Tool agents to complete the composed task via function calls.
Parameters:
__init__()llm_serving: LLM interface instance
run()storage: data storage interfaceinput_task_key: composed task fieldinput_sub_tasks_keys: list of subtask fieldsinput_functions_key: field name for function listoutput_conversations_key: output field for conversations (default:"conversations")
Highlights:
- Multi-agent interactive generation
- Supports function call injection
- Up to 5 full interaction rounds
For code examples, refer to the Function Call Data Synthesis Pipeline or the GitHub source file.

