【Diffusers】Stable Diffusion 3.5 Largeを使ってみる(VRAM 12GB未満で動作)

PC環境

こちらのPCを使用しています。

Windows 11
RTX 3080 Laptop (VRAM 16GB)
CUDA 11.8
Python 3.12

Python環境構築

pip install torch==2.4.1+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install diffusers[torch]
pip install transformers protobuf sentencepiece bitsandbytes
torch==2.4.1+cu118
diffusers==0.31.0
transformers==4.45.2
bitsandbytes==0.44.1
protobuf==5.28.3
sentencepiece==0.2.0

量子化なし

time: 1930.87 sec
GPU 0 - Used memory: 15.95/16.00 GB

transformerを量子化

time: 129.71 sec
GPU 0 - Used memory: 10.84/16.00 GB

Pythonスクリプト

量子化なし

import torch
from diffusers import StableDiffusion3Pipeline
from decorator import gpu_monitor, time_monitor

@gpu_monitor(interval=0.5)
@time_monitor
def main():
    pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16)
    #pipe = pipe.to("cuda")
    pipe.enable_model_cpu_offload()


    seed = 20241023
    image = pipe(
        "A capybara holding a sign that reads Hello World",
        num_inference_steps=28,
        guidance_scale=3.5,
        generator = torch.Generator().manual_seed(seed)
    ).images[0]
    image.save(f"capybara{seed}.jpg")

if __name__ == "__main__":
    main()

transformerを量子化

import torch
from diffusers import StableDiffusion3Pipeline, BitsAndBytesConfig, SD3Transformer2DModel
from decorator import gpu_monitor, time_monitor

@gpu_monitor(interval=0.5)
@time_monitor
def main():
    model_id = "stabilityai/stable-diffusion-3.5-large"

    nf4_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )
    model_nf4 = SD3Transformer2DModel.from_pretrained(
        model_id,
        subfolder="transformer",
        quantization_config=nf4_config,
        torch_dtype=torch.bfloat16
    )

    pipe = StableDiffusion3Pipeline.from_pretrained(
        model_id, 
        transformer=model_nf4,
        torch_dtype=torch.bfloat16
    )
    pipe.enable_model_cpu_offload()


    seed = 20241023
    image = pipe(
        "A capybara holding a sign that reads Hello World",
        num_inference_steps=28,
        guidance_scale=3.5,
        generator = torch.Generator().manual_seed(seed)
    ).images[0]
    image.save(f"capybara{seed}_2.jpg")

if __name__ == "__main__":
    main()

その他

ベンチマークはこちらで記述したスクリプトで行いました。
touch-sp.hatenablog.com