【Diffusers】Perturbed-Attention Guidance(PAG)とIP-Adapterを組み合わせてみる

はじめに

Perturbed-Attention Guidance(PAG)についてはこちらを見て下さい。
touch-sp.hatenablog.com
今回こちらのmultiple IP-AdapterとPAGを組み合わせてみました。
touch-sp.hatenablog.com

結果

「Plus」と「Plus Face」の組み合わせ


「Plus」と「FaceID」の組み合わせ


「Plus」と「Plus Face」と「FaceID」の組み合わせ


左から

  • PAGなし
  • pag_applied_layers=["mid"]
  • pag_applied_layers=["down.block_2"]
  • pag_applied_layers=["down.block_2", "up.block_1.attentions_0"]

Pythonスクリプト

「Plus」と「Plus Face」と「FaceID」の3つを組み合わせる場合のスクリプトです。

import torch
from diffusers.utils import load_image
from diffusers import AutoPipelineForText2Image, DPMSolverMultistepScheduler

pipeline = AutoPipelineForText2Image.from_pretrained(
    "modernDisneyXL_v3",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipeline.scheduler = DPMSolverMultistepScheduler(
    num_train_timesteps=1000,
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="scaled_linear",
    steps_offset=1,
    algorithm_type="sde-dpmsolver++",
    use_karras_sigmas=True
)
pipeline.to("cuda")

pipeline.load_ip_adapter(
    ["IP-Adapter", "IP-Adapter", "IP-Adapter-FaceID"],
    subfolder=["sdxl_models", "sdxl_models", None],
    weight_name=[
        "ip-adapter-plus_sdxl_vit-h.safetensors",
        "ip-adapter-plus-face_sdxl_vit-h.safetensors",
        "ip-adapter-faceid_sdxl.bin"
    ],
    image_encoder_folder=None
)
pipeline.set_ip_adapter_scale([0.5, 0.5, 0.5])

t1 = torch.load("xl_plus.ipadpt")
t2 = torch.load("xl_plusface.ipadpt")
t3 = torch.load("xl_faceid.ipadpt")

image_embeds = [t1[0], t2[0], t3[0]]

pipeline = AutoPipelineForText2Image.from_pipe(pipeline, enable_pag=True)

for i, layer in enumerate([["mid"], ["down.block_2"], ["down.block_2", "up.block_1.attentions_0"]]):

    pipeline.set_pag_applied_layers(layer)

    image = pipeline(
        prompt="a woman",
        ip_adapter_image_embeds=image_embeds,
        negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
        num_inference_steps=50,
        num_images_per_prompt=1,
        guidance_scale = 3.0,
        width=1024,
        height=1024, 
        generator=torch.Generator(device="cpu").manual_seed(0),
        pag_scale=5.0
    ).images[0]

    image.save(f"with_pag_{i}.jpg")





このエントリーをはてなブックマークに追加