【Diffusers】「bitsandbytes」がWindowsでも使えるようになっていたので、最近話題の画像生成AI「FLUX.1-dev」で試してみました

PC環境

Windows 11
CUDA 11.8
Python 3.12

Python環境構築

pip install torch==2.4.0+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install diffusers[torch]
pip install transformers protobuf sentencepiece bitsandbytes

Pythonスクリプト

VRAM使用量は12GB未満に抑えられました。

import time
import torch
from diffusers import FluxPipeline
from transformers import T5EncoderModel, BitsAndBytesConfig

start = time.perf_counter()

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model_id = "black-forest-labs/FLUX.1-dev"

text_encoder_2 = T5EncoderModel.from_pretrained(
    model_id,
    subfolder="text_encoder_2",
    quantization_config=quantization_config,
    device_map="auto"
)

pipe = FluxPipeline.from_pretrained(
    model_id,
    text_encoder_2=text_encoder_2,
    torch_dtype=torch.float16,
    device_map="balanced"
)

prompt = "an insect robot preparing a delicious meal, anime style"
out = pipe(
    prompt=prompt,
    guidance_scale=3.5,
    height=768,
    width=1360,
    num_inference_steps=50,
    generator=torch.manual_seed(0)
).images[0]
out.save("dev_result.png")

end = time.perf_counter()
print(f"time: {(end - start):.2f}sec")

結果

結果①

プロンプト

A cat holding a sign that says hello world

生成時間

モデル読み込み時間を含んでいます。

time: 209.90sec

結果②

プロンプト

an insect robot preparing a delicious meal, anime style

生成時間

モデル読み込み時間を含んでいます。

time: 210.54sec

結果③

プロンプト

A photorealistic portrait of a young Japanese woman with long black hair and natural makeup, wearing a casual white blouse, sitting in a modern Tokyo cafe with soft window light

生成時間

モデル読み込み時間を含んでいます。

time: 210.67sec

結果④

プロンプト

A neon-drenched cityscape at night, with towering holographic billboards, flying vehicles zipping between skyscrapers, and crowds of diverse people with cybernetic enhancements walking on elevated walkways