Image Generation Pipeline (API Version)
About 986 wordsAbout 3 min
2026-02-15
1. Overview
The Image Generation Pipeline generates target images from user-provided text, providing image data for subsequent tasks such as image understanding and image editing.
This version uses cloud API models for text-to-image generation. Currently supported API models include:
- OpenAI format:
dall-e-2,dall-e-3,gpt-image-1 - Gemini format:
gemini-2.5-flash-image,gemini-3-pro-image-preview, etc.
For specific features and detailed usage of different models, please refer to the official API documentation of each model.
💡 Tip: If you want to use local GPU models for text-to-image generation, please see Image Generation Pipeline (GPU Version)
2. Quick Start
Step 1: Create a New DataFlow Working Directory
mkdir run_dataflow_mm
cd run_dataflow_mmStep 2: Configure API KEY and BASE URL
Configure the API KEY and BASE URL by setting environment variables:
# Set API key (required)
export DF_API_KEY=<your_api_key>
# Set API base URL (optional)
# If not set, default URLs will be used based on the selected API format:
# - Gemini format: https://generativelanguage.googleapis.com
# - OpenAI format: https://api.openai.com/v1
export DF_BASE_URL=<your_base_url> # optionalStep 3: Prepare Text Data
We use jsonl files to store text data, with one sample per line. Here is a simple example of input data:
{"conversations": [{"content": "a fox darting between snow-covered pines at dusk", "role": "user"}]}
{"conversations": [{"content": "a kite surfer riding emerald waves under a cloudy sky", "role": "user"}]}conversations contains a list of dialogues for image generation descriptions, and the content field is the text prompt.
Step 4: Run the Pipeline
- Basic Usage
python dataflow/statics/api_pipelines/text_to_image_generation_api_pipeline.py \
--first_entry_file_name <your_input_text_file_path>Generated files will be saved by default in the ./cache_local/text2image_api directory.
- Command-line Arguments
This pipeline supports the following command-line arguments:
| Parameter | Type | Default | Description |
|---|---|---|---|
--api_format | str | gemini | API format type, options: openai or gemini |
--model_name | str | gemini-3-pro-image-preview | Model name, options: dall-e-2, dall-e-3, gpt-image-1, gemini-2.5-flash-image, gemini-3-pro-image-preview, etc. |
--batch_size | int | 4 | Batch size, controls the number of samples processed each time |
--first_entry_file_name | str | None | Input data file path (jsonl format) |
--cache_path | str | ./cache_local/text2image_api | Cache path for storing intermediate results and final generated images |
3. Data Flow and Pipeline Logic
1. Input Data
The input data for this pipeline includes the following fields:
- conversations: Dialogue format data containing text prompts.
This input data is stored in jsonl files and managed and read through the FileStorage object:
self.storage = FileStorage(
first_entry_file_name="<your_jsonl_file_path>",
cache_path="./cache_local/text2image_api",
file_name_prefix="dataflow_cache_step",
cache_type="jsonl"
)2. Text-to-Image Generation (PromptedImageGenerator)
The core step of the pipeline is using the Prompted Image Generator (PromptedImageGenerator) combined with cloud API services to generate corresponding images for each text prompt.
Features:
- Generate images from text prompts using cloud API models
- Support multiple API formats (OpenAI, Gemini, etc.)
- Configurable batch size and generation parameters
- Automatically save generated images to specified paths
Input: Dialogue format data (containing text prompts)
Output: Generated image file paths
API Service Configuration:
self.serving = APIImageGenServing(
api_url=api_url, # API service address
image_io=ImageIO(save_path=image_save_path), # Image save path
Image_gen_task="text2image", # Task type: text-to-image
batch_size=4, # Batch size
api_format="gemini", # API format: gemini or openai
model_name="gemini-3-pro-image-preview", # Model name
api_key=api_key, # API key
)Operator Initialization:
self.text_to_image_generator = PromptedImageGenerator(
t2i_serving=self.serving, # Text-to-image service
save_interval=10 # Save interval
)Operator Execution:
self.text_to_image_generator.run(
storage=self.storage.step(),
input_conversation_key="conversations", # Input dialogue field
output_image_key="images", # Output image field
)3. Output Data
Finally, the output data generated by the pipeline will include the following:
- conversations: Original dialogue data (containing text prompts)
- images: List of generated image file paths
Output Data Example:
{"conversations":[{"content":"a fox darting between snow-covered pines at dusk","role":"user"}],"images":["./cache_local/text2image_api/sample0_condition0/sample0_condition0_0.png"]}4. Pipeline Example
Below is an example of a text-to-image generation pipeline using cloud API:
import os
import argparse
from pathlib import Path
from dataflow.operators.core_vision import PromptedImageGenerator
from dataflow.serving.api_image_gen_serving import APIImageGenServing
from dataflow.utils.storage import FileStorage
from dataflow.io import ImageIO
class ImageGenerationAPIPipeline():
"""
Text to Image Generation API Pipeline
Supported Models:
OpenAI format (api_format="openai"): dall-e-2, dall-e-3, gpt-image-1
Gemini format (api_format="gemini"): gemini-2.5-flash-image, gemini-3-pro-image-preview, etc.
"""
def __init__(
self,
api_format="gemini",
model_name="gemini-3-pro-image-preview",
batch_size=4,
first_entry_file_name=None,
cache_path="./cache_local/text2image_api",
):
current_file = Path(__file__).resolve()
project_root = current_file.parent.parent.parent.parent.parent
if first_entry_file_name is None:
data_file = project_root / "dataflow" / "example" / "image_gen" / "text2image" / "prompts.jsonl"
first_entry_file_name = str(data_file)
# -------- Storage Configuration --------
self.storage = FileStorage(
first_entry_file_name=first_entry_file_name,
cache_path=cache_path,
file_name_prefix="dataflow_cache_step",
cache_type="jsonl"
)
# -------- API Configuration --------
api_key = os.environ.get("DF_API_KEY")
api_url = os.environ.get("DF_BASE_URL")
if api_key is None:
raise ValueError("API key is required. Please set it via environment variable DF_API_KEY")
if api_url is None:
if api_format == "gemini":
api_url = "https://generativelanguage.googleapis.com"
else:
api_url = "https://api.openai.com/v1"
image_save_path = str(project_root / "cache_local" / "text2image_api")
# -------- Image Generation API Service --------
self.serving = APIImageGenServing(
api_url=api_url,
image_io=ImageIO(save_path=image_save_path),
Image_gen_task="text2image",
batch_size=batch_size,
api_format=api_format,
model_name=model_name,
api_key=api_key,
)
# -------- Text-to-Image Generation Operator --------
self.text_to_image_generator = PromptedImageGenerator(
t2i_serving=self.serving,
save_interval=10
)
def forward(self):
# Call PromptedImageGenerator to generate images
self.text_to_image_generator.run(
storage=self.storage.step(),
input_conversation_key="conversations",
output_image_key="images",
)
if __name__ == "__main__":
# -------- Command-line Argument Parsing --------
parser = argparse.ArgumentParser(description="Cloud API Image Generation Pipeline")
parser.add_argument('--api_format', choices=['openai', 'gemini'], default='gemini',
help='API format type: openai (OpenAI DALL-E) or gemini (Google Gemini)')
parser.add_argument('--model_name', type=str, default='gemini-3-pro-image-preview',
help='Model name')
parser.add_argument('--batch_size', type=int, default=4, help='Batch size')
parser.add_argument('--first_entry_file_name', type=str, default=None,
help='Input data file path (default uses example_data)')
parser.add_argument('--cache_path', type=str, default="./cache_local/text2image_api",
help='Cache path')
args = parser.parse_args()
if not os.environ.get("DF_API_KEY"):
parser.error("Environment variable DF_API_KEY is not set. Please use export DF_API_KEY=your_api_key to set it")
# -------- Pipeline Entry Point --------
model = ImageGenerationAPIPipeline(
api_format=args.api_format,
model_name=args.model_name,
batch_size=args.batch_size,
first_entry_file_name=args.first_entry_file_name,
cache_path=args.cache_path,
)
model.forward()
