はじめに
Perturbed-Attention Guidance(PAG)についてはこちらを見て下さい。touch-sp.hatenablog.com
今回こちらのmultiple IP-AdapterとPAGを組み合わせてみました。
touch-sp.hatenablog.com
結果
「Plus」と「Plus Face」の組み合わせ
「Plus」と「FaceID」の組み合わせ
「Plus」と「Plus Face」と「FaceID」の組み合わせ
左から
- PAGなし
- pag_applied_layers=["mid"]
- pag_applied_layers=["down.block_2"]
- pag_applied_layers=["down.block_2", "up.block_1.attentions_0"]
Pythonスクリプト
「Plus」と「Plus Face」と「FaceID」の3つを組み合わせる場合のスクリプトです。import torch from diffusers.utils import load_image from diffusers import AutoPipelineForText2Image, DPMSolverMultistepScheduler pipeline = AutoPipelineForText2Image.from_pretrained( "modernDisneyXL_v3", torch_dtype=torch.float16, variant="fp16" ) pipeline.scheduler = DPMSolverMultistepScheduler( num_train_timesteps=1000, beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", steps_offset=1, algorithm_type="sde-dpmsolver++", use_karras_sigmas=True ) pipeline.to("cuda") pipeline.load_ip_adapter( ["IP-Adapter", "IP-Adapter", "IP-Adapter-FaceID"], subfolder=["sdxl_models", "sdxl_models", None], weight_name=[ "ip-adapter-plus_sdxl_vit-h.safetensors", "ip-adapter-plus-face_sdxl_vit-h.safetensors", "ip-adapter-faceid_sdxl.bin" ], image_encoder_folder=None ) pipeline.set_ip_adapter_scale([0.5, 0.5, 0.5]) t1 = torch.load("xl_plus.ipadpt") t2 = torch.load("xl_plusface.ipadpt") t3 = torch.load("xl_faceid.ipadpt") image_embeds = [t1[0], t2[0], t3[0]] pipeline = AutoPipelineForText2Image.from_pipe(pipeline, enable_pag=True) for i, layer in enumerate([["mid"], ["down.block_2"], ["down.block_2", "up.block_1.attentions_0"]]): pipeline.set_pag_applied_layers(layer) image = pipeline( prompt="a woman", ip_adapter_image_embeds=image_embeds, negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", num_inference_steps=50, num_images_per_prompt=1, guidance_scale = 3.0, width=1024, height=1024, generator=torch.Generator(device="cpu").manual_seed(0), pag_scale=5.0 ).images[0] image.save(f"with_pag_{i}.jpg")