Image Generation Pipeline (GPU Version)

About 723 wordsAbout 2 min

2026-02-15

1. Overview

The Image Generation Pipeline generates target images from user-provided text, providing image data for subsequent tasks such as image understanding and image editing.

This version uses local GPU models for text-to-image generation, supporting local deployment of models such as FLUX.1-dev.

💡 Tip: If you want to use cloud API models for text-to-image generation, please see Image Generation Pipeline (API Version)

2. Quick Start

Step 1: Create a New DataFlow Working Directory

mkdir run_dataflow_mm
cd run_dataflow_mm

Step 2: Configure Model Path

Configure the model path in the pipeline code. Two methods are supported:

(1) Method 1: Use Hugging Face model path (auto-download)

hf_model_name_or_path="black-forest-labs/FLUX.1-dev"

(2) Method 2: Use local model path (downloaded model)

hf_model_name_or_path="/path/to/your/local/FLUX.1-dev"

Modify the hf_model_name_or_path parameter of LocalImageGenServing in text_to_image_generation_pipeline.py:

self.serving = LocalImageGenServing(
    image_io=ImageIO(save_path=image_save_path),
    batch_size=4,
    hf_model_name_or_path="black-forest-labs/FLUX.1-dev",  # Model path
    hf_cache_dir="./cache_local",  # Hugging Face model cache directory
    hf_local_dir="./ckpt/models/",  # Local model storage directory
    diffuser_num_inference_steps=20,  # Diffusion model inference steps, adjustable to balance speed and quality
    diffuser_image_height=512,  # Generated image height
    diffuser_image_width=512,  # Generated image width
)

Step 3: Prepare Text Data

We use jsonl files to store text data, with one sample per line. Here is a simple example of input data:

{"conversations": [{"content": "a fox darting between snow-covered pines at dusk", "role": "user"}]}
{"conversations": [{"content": "a kite surfer riding emerald waves under a cloudy sky", "role": "user"}]}

conversations contains a list of dialogues for image generation descriptions, and the content field is the text prompt.

Step 4: Run the Pipeline

python dataflow/statics/gpu_pipelines/text_to_image_generation_pipeline.py

Generated files will be saved by default in the ./cache_local/text2image_local directory.

3. Data Flow and Pipeline Logic

1. Input Data

The input data for this pipeline includes the following fields:

conversations: Dialogue format data containing text prompts.

This input data is stored in jsonl files and managed and read through the FileStorage object:

self.storage = FileStorage(
    first_entry_file_name="<your_jsonl_file_path>",
    cache_path="./cache_local/text2image_local",
    file_name_prefix="dataflow_cache_step",
    cache_type="jsonl"
)

2. Text-to-Image Generation (PromptedImageGenerator)

The core step of the pipeline is using the Prompted Image Generator (PromptedImageGenerator) combined with local GPU models to generate corresponding images for each text prompt.

Features:

Generate images from text prompts using local GPU models (e.g., FLUX.1-dev)
Support configuration of inference steps, image dimensions, and other parameters
Adjustable batch size to optimize GPU utilization
Automatically save generated images to specified paths

Input: Dialogue format data (containing text prompts)
Output: Generated image file paths

Local GPU Service Configuration:

self.serving = LocalImageGenServing(
    image_io=ImageIO(save_path=image_save_path),  # Image save path
    batch_size=4,  # Batch size
    hf_model_name_or_path="black-forest-labs/FLUX.1-dev",  # Model path
    hf_cache_dir="./cache_local",  # Hugging Face model cache directory
    hf_local_dir="./ckpt/models/",  # Local model storage directory
    diffuser_num_inference_steps=20,  # Diffusion model inference steps
    diffuser_image_height=512,  # Generated image height
    diffuser_image_width=512,  # Generated image width
)

Operator Initialization:

self.text_to_image_generator = PromptedImageGenerator(
    t2i_serving=self.serving,  # Text-to-image service
    save_interval=10  # Save interval
)

Operator Execution:

self.text_to_image_generator.run(
    storage=self.storage.step(),
    input_conversation_key="conversations",  # Input dialogue field
    output_image_key="images",  # Output image field
)

3. Output Data

Finally, the output data generated by the pipeline will include the following:

conversations: Original dialogue data (containing text prompts)
images: List of generated image file paths

Output Data Example:

{"conversations":[{"content":"a fox darting between snow-covered pines at dusk","role":"user"}],"images":["./cache_local/text2image_local/sample0_condition0/sample0_condition0_0.png"]}

4. Pipeline Example

Below is an example of a text-to-image generation pipeline using local FLUX models:

import os
from pathlib import Path
from dataflow.operators.core_vision import PromptedImageGenerator
from dataflow.serving.local_image_gen_serving import LocalImageGenServing
from dataflow.utils.storage import FileStorage
from dataflow.io import ImageIO


class ImageGenerationPipeline():
    def __init__(self):
        current_file = Path(__file__).resolve()
        project_root = current_file.parent.parent.parent.parent.parent
        
        prompts_file = project_root / "dataflow" / "example" / "image_gen" / "text2image" / "prompts.jsonl"
        
        # -------- Storage Configuration --------
        self.storage = FileStorage(
            first_entry_file_name=str(prompts_file),
            cache_path="./cache_local/text2image_local",
            file_name_prefix="dataflow_cache_step",
            cache_type="jsonl"
        )

        image_save_path = str(project_root / "cache_local" / "text2image_local")
        
        # -------- Local GPU Image Generation Service --------
        self.serving = LocalImageGenServing(
            image_io=ImageIO(save_path=image_save_path),
            batch_size=4,
            hf_model_name_or_path="black-forest-labs/FLUX.1-dev",  # Or local model path
            hf_cache_dir="./cache_local",
            hf_local_dir="./ckpt/models/",
            diffuser_num_inference_steps=20,
            diffuser_image_height=512,
            diffuser_image_width=512,
        )

        # -------- Text-to-Image Generation Operator --------
        self.text_to_image_generator = PromptedImageGenerator(
            t2i_serving=self.serving,
            save_interval=10
        )
    
    def forward(self):
        # Call PromptedImageGenerator to generate images
        self.text_to_image_generator.run(
            storage=self.storage.step(),
            input_conversation_key="conversations",
            output_image_key="images",
        )

if __name__ == "__main__":
    # -------- Pipeline Entry Point --------
    model = ImageGenerationPipeline()
    model.forward()