はじめに
最近DiffusersのAnimateDiffでControlNetが使えるようになりました。さっそく使ってみました。環境
Windows 11 CUDA 11.8 Python 3.11
Python環境構築
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118 pip install git+https://github.com/huggingface/diffusers pip install accelerate transformers
Pythonスクリプト
OpnePoseを使っています。ControlNet用の画像は自動でダウンロードしてくれます。import torch from diffusers import DiffusionPipeline, AutoencoderKL, ControlNetModel, MotionAdapter, DPMSolverMultistepScheduler from PIL import Image from torchvision.datasets.utils import download_url # download gif file fname = "sample.gif" url = "https://user-images.githubusercontent.com/7365912/265043418-23291941-864d-495a-8ba8-d02e05756396.gif" download_url(url, root = '.', filename = fname) adapter = MotionAdapter.from_pretrained("animatediff-motion-adapter-v1-5-2") controlnet = ControlNetModel.from_pretrained( "controlnet/control_v11p_sd15_openpose", torch_dtype=torch.float16 ) vae = AutoencoderKL.from_single_file( "vae/vae-ft-mse-840000-ema-pruned.safetensors", torch_dtype=torch.float16 ) model_id = "model/yabalMixTrue25Dv5_ema" pipe = DiffusionPipeline.from_pretrained( model_id, motion_adapter=adapter, controlnet=controlnet, vae=vae, custom_pipeline="pipeline_animatediff_controlnet", torch_dtype=torch.float16 ).to("cuda") pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained( model_id, subfolder="scheduler", beta_schedule="linear", clip_sample=False, timestep_spacing="linspace", steps_offset=1 ) pipe.enable_vae_slicing() conditioning_frames = [] gif_images = Image.open(fname) for i in range(gif_images.n_frames): gif_images.seek(i) # <class 'PIL.GifImagePlugin.GifImageFile'> -> <class 'PIL.Image.Image'> image = gif_images.copy() conditioning_frames.append(image.crop((0, 0, 512, 512))) prompt = "a girl, dancing, blue denim, white plain t-shirt, best quality, extremely detailed" negative_prompt = "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality" seed = 222 result = pipe( prompt=prompt, negative_prompt=negative_prompt, num_frames=len(conditioning_frames), # default: 16 width=512, height=512, conditioning_frames=conditioning_frames, num_inference_steps=25, generator=torch.manual_seed(seed) ).frames[0] from diffusers.utils import export_to_gif export_to_gif(result, f"result_{seed}.gif")
結果
動画はGoogle Bloggerに載せておきます。
support-touchsp.blogspot.com