【Diffusers】Perturbed-Attention Guidance(PAG)とControlNetを組み合わせてみる

はじめに

Perturbed-Attention Guidance(PAG)についてはこちらを見て下さい。
touch-sp.hatenablog.com
今回はぼやけた写真を修復する「SDXL_Controlnet_Tile_Realistic」と組み合わせてみます。

用意した写真

結果

左上：PAGなし
右上：pag_applied_layers=["mid"]
左下：pag_applied_layers=["down.block_2"]
右下：pag_applied_layers=["down.block_2", "up.block_1.attentions_0"]

元画像と「pag_applied_layers=["mid"]」で作成した画像を並べた結果がこちらです。

Pythonスクリプト

import torch

from diffusers import ControlNetModel, DPMSolverMultistepScheduler, AutoPipelineForText2Image
from diffusers.utils import load_image

# model was downloaded from https://huggingface.co/OzzyGT/SDXL_Controlnet_Tile_Realistic
controlnet = ControlNetModel.from_pretrained(
    "controlnet/SDXL_Controlnet_Tile_Realistic",
    torch_dtype=torch.float16,
    variant="fp16"
)

pipeline = AutoPipelineForText2Image.from_pretrained(
    "fudukiMix_v20",
    torch_dtype=torch.float16,
    variant="fp16",
    controlnet=controlnet,
).to("cuda")

pipeline.scheduler = DPMSolverMultistepScheduler.from_config(
    pipeline.scheduler.config,
    use_karras_sigmas=True
)

control_image = load_image("face1024.jpg")

prompt = "high quality image of a woman"
negative_prompt = "blurry, low quality"

# no PAG
generator=torch.Generator(device="cpu").manual_seed(0)
image = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=7.5,
    num_inference_steps=25,
    image=control_image,
    controlnet_conditioning_scale=1.0,
    generator=generator
).images[0]

image.save("no_pag.png")

# with PAG
pipeline = AutoPipelineForText2Image.from_pipe(pipeline, enable_pag=True)

for i, layer in enumerate([["mid"], ["down.block_2"], ["down.block_2", "up.block_1.attentions_0"]]):
    pipeline.set_pag_applied_layers(layer)
    generator=torch.Generator(device="cpu").manual_seed(0)
    image = pipeline(
        prompt=prompt,
        negative_prompt=negative_prompt,
        image=control_image,
        controlnet_conditioning_scale=1.0,
        num_inference_steps=25,
        num_images_per_prompt=1,
        guidance_scale = 3.0,
        width=1024,
        height=1024, 
        generator=generator,
        pag_scale=5.0
    ).images[0]

    image.save(f"with_pag_{i}.jpg")

ランキング参加中

プログラミング