PixArt-alpha を Diffusers から使ってみる

github.com

環境

Windows 11
CUDA 11.7
Python 3.10
pip install torch==2.0.1+cu117 torchvision==0.15.2+cu117 --index-url https://download.pytorch.org/whl/cu117
pip install diffusers[torch]
pip install transformers omegaconf sentencepiece beautifulsoup4 ftfy

Pythonスクリプト

import torch
from diffusers import PixArtAlphaPipeline

pipe = PixArtAlphaPipeline.from_pretrained(
    "model/PixArt-XL-2-1024-MS",
    torch_dtype=torch.float16
).to("cuda")

prompt = "A small cactus with a happy face in the Sahara desert"

seed = 110000
for steps in range(20, 50, 10):
    generator = torch.manual_seed(seed)
    image = pipe(
        prompt,
        generator=generator,
        num_inference_steps=steps
    ).images[0]

    image.save(f"pixart_result_{steps}.png")

結果

左からnum_inference_steps 20→30→40です。

補足1

diffusers==0.23.0以降、「PixArtAlphaPipeline」は「AutoPipelineForText2Image」で代用可能です。

import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "model/PixArt-XL-2-1024-MS",
    torch_dtype=torch.float16
).to("cuda")

prompt = "A small cactus with a happy face in the Sahara desert"

seed = 110000
for steps in range(20, 50, 10):
    generator = torch.manual_seed(seed)
    image = pipe(
        prompt,
        generator=generator,
        num_inference_steps=steps
    ).images[0]

    image.save(f"pixart_result_{steps}.png")

補足2

縦横比1以外(正方形でない)画像を作成する時には resolution_binning が有効とのことです。

import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "model/PixArt-XL-2-1024-MS",
    torch_dtype=torch.float16
).to("cuda")

prompts = [
    "A small cactus with a happy face in the Sahara desert.",
    "Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, intricate detail.",
    "stars, water, brilliantly, gorgeous large scale scene, a little girl, in the style of dreamy realism, light gold and amber, blue and pink, brilliantly illuminated in the background."
]

seed = 100000

for i, prompt in enumerate(prompts):
    generator = torch.manual_seed(seed) 
    no_resolution_binning = pipe(
        prompts,
        height=1024,
        width=768,
        generator=generator,
        use_resolution_binning=False
    ).images
    
    generator = torch.manual_seed(seed)
    resolution_binning = pipe(
        prompts,
        height=1024,
        width=768,
        generator=generator,
        use_resolution_binning=True
    ).images

from diffusers.utils import make_image_grid
for i in range(len(prompts)):
    image = make_image_grid([no_resolution_binning[i], resolution_binning[i]], rows=1, cols=2)
    image.save(f"resulst{i}.png")



左がresolution_binningなし、右がありです。





このエントリーをはてなブックマークに追加