Image Editing Pipeline (API Version)
About 1225 wordsAbout 4 min
2026-02-15
1. Overview
The Image Editing Pipeline generates corresponding edited images based on existing images and editing instructions. The usage is very simple: input image files and editing instructions (text prompts), and output edited images.
This version uses cloud API models for image editing. Currently supported API models include:
- OpenAI format:
dall-e-2,gpt-image-1 - Gemini format:
gemini-2.5-flash-image,gemini-3-pro-image-preview, etc.
💡 Tip: If you want to use local GPU models for image editing, please see Image Editing Pipeline (GPU Version)
Different models support different image editing capabilities, as follows:
gpt-image-1:
- Edit existing images, including replacing partial regions ("inpainting"), using masks, etc.
- Generate new images using other images as references
dall-e-2:
- Edit existing images
- Generate variants of existing images
Gemini series models (Nano banana, Nano banana pro):
- Edit existing images, including adding/removing/modifying elements, changing styles, adjusting color grading, etc.
- Multi-image input: Support uploading multiple images for editing
- Multi-round editing: Support iterative image modification using multi-turn conversations
Note: For specific features, limitations, and detailed usage of each model, please refer to the official API documentation of each model.
2. Quick Start
Step 1: Create a New DataFlow Working Directory
mkdir run_dataflow_mm
cd run_dataflow_mmStep 2: Configure API KEY and BASE URL
Configure the API KEY and BASE URL by setting environment variables:
# Set API key (required)
export DF_API_KEY=<your_api_key>
# Set API base URL (optional)
# If not set, default URLs will be used based on the selected API format:
# - Gemini format: https://generativelanguage.googleapis.com
# - OpenAI format: https://api.openai.com/v1
export DF_BASE_URL=<your_base_url> # optionalStep 3: Prepare Image and Text Data
We use jsonl files to store image and text data, with one sample per line. Here is a simple example of input data:
{"images": "image.png", "conversations": [{"role": "user", "content": "Change the guitar to a piano."}]}
{"images": ["image.png", "human_inpaint.jpg"], "conversations": [{"role": "user", "content": "Change the color of the vase in the first picture to the background color of the second picture."}]}
{"images": "human_inpaint.jpg", "conversations": [{"role": "user", "content": "Complete the shadowed part into a generic person's portrait."}, {"role": "user", "content": "Update this graph to be in English."}]}images is the path to the image to be edited; conversations contains a list of dialogues with image editing instructions, and the content field is the text prompt.
For Gemini series models, images supports strings (single image path) or lists (multiple image paths), and conversations supports multi-turn dialogues.
Step 4: Run the Pipeline
- Basic Usage
python dataflow/statics/api_pipelines/image_editing_api_pipeline.py \
--first_entry_file_name <your_input_data_file_path>Generated files will be saved by default in the ./cache_local/image_edit_api directory.
- Command-line Arguments
This pipeline supports the following command-line arguments:
| Parameter | Type | Default | Description |
|---|---|---|---|
--api_format | str | gemini | API format type, options: openai or gemini |
--model_name | str | gemini-3-pro-image-preview | Model name, options: dall-e-2, gpt-image-1, gemini-2.5-flash-image, gemini-3-pro-image-preview, etc. |
--batch_size | int | 4 | Batch size, controls the number of samples processed each time |
--first_entry_file_name | str | None | Input data file path (jsonl format) |
--cache_path | str | ./cache_local/image_edit_api | Cache path for storing intermediate results and final generated images |
3. Data Flow and Pipeline Logic
1. Input Data
The input data for this pipeline includes the following fields:
- images: Path to the image(s) to be edited, can be a string or a list, supporting image formats such as
png,jpg, etc. - conversations: Dialogue format data containing image editing instructions.
This input data is stored in jsonl files and managed and read through the FileStorage object:
self.storage = FileStorage(
first_entry_file_name="<your_jsonl_file_path>",
cache_path="./cache_local/image_edit_api",
file_name_prefix="dataflow_cache_step",
cache_type="jsonl"
)2. Image Editing Generation (PromptedImageEditGenerator)
The core step of the pipeline is using the Prompted Image Edit Generator (PromptedImageEditGenerator) combined with cloud API services to generate edited images based on input images and editing instructions.
Features:
- Generate edited images from images and editing instructions using cloud API models
- Support multiple API formats (OpenAI, Gemini, etc.)
- Support single or multiple image input (depending on model capabilities)
- Support multi-turn dialogue editing (Gemini series models)
- Configurable batch size and generation parameters
- Automatically save generated images to specified paths
Input: Image file paths and dialogue format data (containing editing instructions)
Output: Edited image file paths
API Service Configuration:
self.serving = APIImageGenServing(
api_url=api_url, # API service address
image_io=ImageIO(save_path=image_save_path), # Image save path
Image_gen_task="imageedit", # Task type: image editing
batch_size=4, # Batch size
api_format="gemini", # API format: gemini or openai
model_name="gemini-3-pro-image-preview", # Model name
api_key=api_key, # API key
)Operator Initialization:
self.image_edit_generator = PromptedImageEditGenerator(
image_edit_serving=self.serving, # Image editing service
save_interval=10 # Save interval
)Operator Execution:
self.image_edit_generator.run(
storage=self.storage.step(),
input_image_key="images", # Input image field
input_conversation_key="conversations", # Input dialogue field
output_image_key="output_image", # Output image field
)3. Output Data
Finally, the output data generated by the pipeline will include the following:
- images: Original input image path(s)
- conversations: Original dialogue data (containing editing instructions)
- output_image: List of edited image file paths
Output Data Example:
{"images":"image.png","conversations":[{"role":"user","content":"Change the guitar to a piano."}],"output_image":["./cache_local/image_edit_api/sample_0/sample_0_0.png"]}4. Pipeline Example
Below is an example of an image editing pipeline using cloud API models:
import os
import argparse
from pathlib import Path
from dataflow.operators.core_vision import PromptedImageEditGenerator
from dataflow.serving.api_image_gen_serving import APIImageGenServing
from dataflow.utils.storage import FileStorage
from dataflow.io import ImageIO
class ImageEditingAPIPipeline():
"""
Image Editing API Pipeline
Supported Models:
OpenAI format (api_format="openai"): dall-e-2, gpt-image-1
Gemini format (api_format="gemini"): gemini-2.5-flash-image, gemini-3-pro-image-preview, etc.
"""
def __init__(
self,
api_format="gemini",
model_name="gemini-3-pro-image-preview",
batch_size=4,
first_entry_file_name=None,
cache_path="./cache_local/image_edit_api",
):
current_file = Path(__file__).resolve()
project_root = current_file.parent.parent.parent.parent.parent
if first_entry_file_name is None:
data_file = project_root / "dataflow" / "example" / "image_gen" / "image_edit" / "prompts_api.jsonl"
first_entry_file_name = str(data_file)
# -------- Storage Configuration --------
self.storage = FileStorage(
first_entry_file_name=first_entry_file_name,
cache_path=cache_path,
file_name_prefix="dataflow_cache_step",
cache_type="jsonl"
)
if api_format not in ["gemini", "openai"]:
raise ValueError(f"Unsupported api_format: {api_format}. Only 'gemini' and 'openai' are supported for image editing.")
# -------- API Configuration --------
api_key = os.environ.get("DF_API_KEY")
api_url = os.environ.get("DF_BASE_URL")
if api_key is None:
raise ValueError("API key is required. Please set it via environment variable DF_API_KEY")
if api_url is None:
if api_format == "gemini":
api_url = "https://generativelanguage.googleapis.com"
else: # openai
api_url = "https://api.openai.com/v1"
image_save_path = str(project_root / "cache_local" / "image_edit_api")
# -------- Image Editing API Service --------
self.serving = APIImageGenServing(
api_url=api_url,
image_io=ImageIO(save_path=image_save_path),
Image_gen_task="imageedit",
batch_size=batch_size,
api_format=api_format,
model_name=model_name,
api_key=api_key,
)
# -------- Image Editing Generation Operator --------
self.image_edit_generator = PromptedImageEditGenerator(
image_edit_serving=self.serving,
save_interval=10
)
def forward(self):
# Call PromptedImageEditGenerator to generate edited images
self.image_edit_generator.run(
storage=self.storage.step(),
input_image_key="images",
input_conversation_key="conversations",
output_image_key="output_image",
)
if __name__ == "__main__":
# -------- Command-line Argument Parsing --------
parser = argparse.ArgumentParser(description="Cloud API Image Editing Pipeline")
parser.add_argument('--api_format', choices=['gemini', 'openai'], default='gemini',
help='API format type: gemini (Google Gemini) or openai (OpenAI DALL-E 2 / gpt-image-1)')
parser.add_argument('--model_name', type=str, default='gemini-3-pro-image-preview',
help='Model name')
parser.add_argument('--batch_size', type=int, default=4, help='Batch size')
parser.add_argument('--first_entry_file_name', type=str, default=None,
help='Input data file path (default uses example_data)')
parser.add_argument('--cache_path', type=str, default="./cache_local/image_edit_api",
help='Cache path')
args = parser.parse_args()
if not os.environ.get("DF_API_KEY"):
parser.error("Environment variable DF_API_KEY is not set. Please use export DF_API_KEY=your_api_key to set it")
# -------- Pipeline Entry Point --------
model = ImageEditingAPIPipeline(
api_format=args.api_format,
model_name=args.model_name,
batch_size=args.batch_size,
first_entry_file_name=args.first_entry_file_name,
cache_path=args.cache_path,
)
model.forward()
