Text‑to‑Image Generation and Image Editing

About 418 wordsAbout 1 min

2025-07-15

To enable DataFlow to support image‑generation capabilities, we have implemented large‑scale image creation and editing using the latest diffusion‑based methods available in diffusers. We then evaluate the quality of the generated images with the Qwen‑VL model. A detailed explanation follows.

Text‑to‑Image Generation

Step 1 – Install the DataFlow environment

pip install open-dataflow

Step 2 – Use a local model for image generation

from dataflow.serving import LocalImageGenServing

self.serving = LocalImageGenServing(
    hf_model_name_or_path="black-forest-labs/FLUX.1-dev",
    hf_cache_dir="~/.cache/huggingface",
    hf_local_dir="./ckpt/models/",
    device="cuda"
)

Step 3 – Prepare the text‑prompt data for generation

{"conversations": [{"content": "a fox darting between snow-covered pines at dusk", "role": "user"}], "images": [""]}
{"conversations": [{"content": "a kite surfer riding emerald waves under a cloudy sky", "role": "user"}], "images": [""]}

Specify the data path:

from dataflow.utils.storage import FileStorage

self.storage = FileStorage(
    first_entry_file_name="your path",
    cache_path="./cache",
    file_name_prefix="dataflow_cache_step",
    cache_type="jsonl",
    media_key="images",
    media_type="image"
)

Step 4 – Generate images

from dataflow.operators import Text2ImageGenerator

self.generator = Text2ImageGenerator(pipe=self.serving)

self.generator.run(
    storage=self.storage.step(),
    input_key="conversations",
    output_key="images"
)

Image Editing

The workflow is almost identical to text‑to‑image generation; only minor tweaks are required.

Call a local model

from dataflow.serving import LocalImageGenServing

self.serving = LocalImageGenServing(
    hf_model_name_or_path="black-forest-labs/FLUX.1-Kontext-dev",
    hf_cache_dir="~/.cache/huggingface",
    hf_local_dir="./ckpt/models/",
    device="cuda",
    Image_gen_task="imageedit",
    diffuser_model_name="FLUX-Kontext"
)

Prepare the data

{"conversations": [{"content": "Change the woman's clothes to a white dress.", "role": "user"}], "images": ["./dataflow/example/test_image_editing/images/image1.png"], "edited_images": [""]}
{"conversations": [{"content": "Change the vase to red.", "role": "user"}], "images": ["./dataflow/example/test_image_editing/images/image2.png"], "edited_images": [""]}

Run the editing pipeline

from dataflow.operators import ImageEditor

self.generator = ImageEditor(pipe=self.serving)

self.generator.run(
    storage=self.storage.step(),
    input_key=["conversations", "images"],
    output_key="edited_images",
    save_interval=save_interval
)

Image Quality Assessment

We use a multimodal large model to score and filter generated images.

Data format

{"conversations": [{"content": "four cups were filled with hot coffee", "role": "user"}], "images": ["./dataflow/example/test_text_to_image_eval/images/four cups were filled with hot coffee_001005.png"]}
{"conversations": [{"content": "four balloons, one cup, four desks, two dogs and four microwaves", "role": "user"}], "images": ["./dataflow/example/test_text_to_image_eval/images/four balloons, one cup, four desks, two dogs and four microwaves_003032.png"]}

Evaluation script

from dataflow.prompts.EvalImageGenerationPrompt import EvalImageGenerationPrompt
from dataflow.serving import LocalModelLLMServing_vllm
from qwen_vl_utils import process_vision_info
from dataflow.operators.eval.image.image_evaluator import EvalImageGenerationGenerator
from dataflow.utils.storage import FileStorage

model = LocalModelLLMServing_vllm(
    hf_model_name_or_path=model_path,
    vllm_tensor_parallel_size=2,
    vllm_temperature=0.7,
    vllm_top_p=0.9,
    vllm_max_tokens=512,
)

captionGenerator = EvalImageGenerationGenerator(
    model,
    EvalImageGenerationPrompt(),
    process_vision_info,
)

storage = FileStorage(
    first_entry_file_name="./dataflow/example/test_text_to_image_eval/prompts.jsonl",
    cache_path="jsonl",
    media_key="images",
    media_type="image",
)

captionGenerator.run(
    storage=storage,
    output_key="caption",
)