はじめに
つい先日、SDXL用のIP-Adapter-FaceIDを紹介したばかりです。touch-sp.hatenablog.com
にもかかわらず新たにIP-Adapter-FaceID-PlusV2が公開されました。
驚くべき開発スピードです。
さっそく使ってみて以前のモデルと比較してみました。
目的
1枚の顔写真からその人の別の写真を作成することが目的です。結果(LoRAなし)
用意した画像
用意した画像は1枚だけです。「fudukiMix_v2.0」を使って作成したものです。
こちらに作り方を載せています。
今回作成した画像
もうこれは同一人物と言っていいでしょう。
IP-Adapter-FaceID(無印)で作成した画像
PlusV2になって元画像の再現度が明らかに高くなっています。
Pythonスクリプト
import os import cv2 from insightface.app import FaceAnalysis from insightface.utils import face_align import torch from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler from PIL import Image from ip_adapter.ip_adapter_faceid import IPAdapterFaceIDPlusXL image_size = 640 # image_size%112 or image_size%128 must be 0 app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider']) app.prepare(ctx_id=0, det_size=(640, 640)) image = cv2.imread("face.png") faces = app.get(image) faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0) face_image = face_align.norm_crop(image, landmark=faces[0].kps, image_size=image_size) base_model_path = "model/fudukiMix_v20" image_encoder_path = "CLIP-ViT-H-14-laion2B-s32B-b79K" ip_ckpt = "IP-Adapter-FaceID/ip-adapter-faceid-plusv2_sdxl.bin" device = "cuda" noise_scheduler = DPMSolverMultistepScheduler( num_train_timesteps=1000, beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", steps_offset=1, algorithm_type="sde-dpmsolver++", use_karras_sigmas=True ) pipe = StableDiffusionXLPipeline.from_pretrained( base_model_path, scheduler=noise_scheduler, torch_dtype=torch.float16, variant="fp16" ) ip_model = IPAdapterFaceIDPlusXL(pipe, image_encoder_path, ip_ckpt, device) prompt = "japanese woman, close-up, natural lighting, wavy hair, from side, white sweater, dyanmic posing, see-through curtain, bright room" negative_prompt = "cleavage, illustration, 3d, 2d, painting, cartoons, sketch, watercolor, monotone, kimono, crossed eyes, strabismus" save_folder = f"plusv2_{image_size}" os.makedirs(save_folder, exist_ok=True) for i in range(2): seed = 20240111 + 2024 * i images = ip_model.generate( prompt=prompt, negative_prompt=negative_prompt, face_image=face_image, faceid_embeds=faceid_embeds, shortcut=True, s_scale=1.0, num_samples=2, width=1024, height=1024, num_inference_steps=40, guidance_scale=7.5, seed=seed ) for j in range(2): images[j].save(os.path.join(save_folder, f"{seed}_{j}.png"))
LoRAを併用すると結果が良くなる?
左から「LoRAなし」→「weight 0.2」→「weight 0.4」→「weight 0.6」です。Pythonスクリプト
import os import cv2 from insightface.app import FaceAnalysis from insightface.utils import face_align import torch from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler from PIL import Image from ip_adapter.ip_adapter_faceid import IPAdapterFaceIDPlusXL image_size = 640 # image_size%112 or image_size%128 must be 0 app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider']) app.prepare(ctx_id=0, det_size=(640, 640)) image = cv2.imread("face.png") faces = app.get(image) faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0) face_image = face_align.norm_crop(image, landmark=faces[0].kps, image_size=image_size) base_model_path = "model/fudukiMix_v20" image_encoder_path = "CLIP-ViT-H-14-laion2B-s32B-b79K" ip_ckpt = "IP-Adapter-FaceID/ip-adapter-faceid-plusv2_sdxl.bin" device = "cuda" noise_scheduler = DPMSolverMultistepScheduler( num_train_timesteps=1000, beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", steps_offset=1, algorithm_type="sde-dpmsolver++", use_karras_sigmas=True ) pipe = StableDiffusionXLPipeline.from_pretrained( base_model_path, scheduler=noise_scheduler, torch_dtype=torch.float16, variant="fp16" ) pipe.load_lora_weights("IP-Adapter-FaceID\ip-adapter-faceid-plusv2_sdxl_lora.safetensors", adapter_name="lora") pipe.set_adapters(["lora"], adapter_weights=[0.6]) ip_model = IPAdapterFaceIDPlusXL(pipe, image_encoder_path, ip_ckpt, device) prompt = "japanese woman, close-up, natural lighting, wavy hair, from side, white sweater, dyanmic posing, see-through curtain, bright room" negative_prompt = "cleavage, illustration, 3d, 2d, painting, cartoons, sketch, watercolor, monotone, kimono, crossed eyes, strabismus" save_folder = f"plusv2_{image_size}" os.makedirs(save_folder, exist_ok=True) for i in range(2): seed = 20240111 + 2024 * i images = ip_model.generate( prompt=prompt, negative_prompt=negative_prompt, face_image=face_image, faceid_embeds=faceid_embeds, shortcut=True, s_scale=1.0, num_samples=2, width=1024, height=1024, num_inference_steps=40, guidance_scale=7.5, seed=seed ) for j in range(2): images[j].save(os.path.join(save_folder, f"{seed}_{j}.png"))
PC環境
Windows 11 CUDA 11.8 Python 3.11
Python環境構築
pip install torch==2.0.1+cu118 --index-url https://download.pytorch.org/whl/cu118 pip install git+https://github.com/huggingface/diffusers pip install accelerate transformers einops pip install git+https://github.com/tencent-ailab/IP-Adapter.git pip install onnxruntime-gpu insightface
LoRAを使う場合
pip install peft