はじめに
MMagic の Controlnet Animation はデフォルトで「sd-controlnet-hed 」を使うようになっています。DW Openposeが使えるようにスクリプトを一部改変しました。結果
動画はGoogle Bloggerに載せています。
support-touchsp.blogspot.com
環境
Windows 11 CUDA 11.7 Python 3.10
Python環境構築
pip install torch==2.0.1+cu117 torchvision==0.15.2+cu117 --index-url https://download.pytorch.org/whl/cu117 pip install accelerate==0.23.0 pip install mmcv==2.0.1 -f https://download.openmmlab.com/mmcv/dist/cu117/torch2.0.0/index.html pip install openmim==0.3.9 pip install mmdet==3.1.0 pip install -U setuptools wheel pip install git+https://github.com/open-mmlab/mmpose@dev-1.x git clone https://github.com/open-mmlab/mmagic cd mmagic pip install -e .
configファイルの書き換え
モデル定義に「_scope_='mmagic'」を追加する必要があります。(←はまりポイント)SchedulerはDDIMSchedulerを指定すると結果が良くなる印象です。# config for model stable_diffusion_v15_url = 'pipeline/toonyou_beta6_ema' controlnet_hed_url = 'controlnet/control_v11p_sd15_openpose' #control_detector = 'lllyasviel/ControlNet' control_scheduler = 'DDIMScheduler' # method type : 'multi-frame rendering' or 'attention_injection' inference_method = 'attention_injection' model = dict( type='ControlStableDiffusionImg2Img', _scope_='mmagic', vae=dict( type='AutoencoderKL', from_pretrained=stable_diffusion_v15_url, subfolder='vae'), unet=dict( type='UNet2DConditionModel', subfolder='unet', from_pretrained=stable_diffusion_v15_url), text_encoder=dict( type='ClipWrapper', clip_type='huggingface', pretrained_model_name_or_path=stable_diffusion_v15_url, subfolder='text_encoder'), tokenizer=stable_diffusion_v15_url, controlnet=dict( type='ControlNetModel', from_pretrained=controlnet_hed_url), scheduler=dict( type='DDPMScheduler', from_pretrained=stable_diffusion_v15_url, subfolder='scheduler'), test_scheduler=dict( type='DDIMScheduler', from_pretrained=stable_diffusion_v15_url, subfolder='scheduler'), data_preprocessor=dict(type='DataPreprocessor'), init_cfg=dict(type='init_from_unet'), enable_xformers=False, )
controlnet_animation_inferencer.py の変更
「mmagic/mmagic/apis/inferencers/controlnet_animation_inferencer.py」を変更します。修正点は2点です。#from controlnet_aux import HEDdetector from controlnet_aux import DWposeDetector
#self.hed = HEDdetector.from_pretrained(cfg.control_detector) self.hed = DWposeDetector( det_config="mmopenlab/yolox_l_8xb8-300e_coco.py", det_ckpt="mmopenlab/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth", pose_config="mmopenlab/dwpose-l_384x288.py", pose_ckpt="mmopenlab/dw-ll_ucoco_384.pth", device="cuda" )
必要なファイルのダウンロード
以下から4つのファイルをダウンロードして「mmopenlab」フォルダに保存します。https://github.com/patrickvonplaten/controlnet_aux/blob/master/src/controlnet_aux/dwpose/dwpose_config/dwpose-l_384x288.py https://github.com/patrickvonplaten/controlnet_aux/blob/master/src/controlnet_aux/dwpose/yolox_config/yolox_l_8xb8-300e_coco.py https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_l_8x8_300e_coco/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth https://huggingface.co/wanghaofan/dw-ll_ucoco_384/resolve/main/dw-ll_ucoco_384.pth
実行ファイル
from mmagic.apis import MMagicInferencer editor = MMagicInferencer( model_name="controlnet_animation", model_config="my_config.py" ) prompt = "masterpiece, 1girl, dancing, best quality, extremely detailed" negative_prompt = ( "longbody, lowres, bad anatomy, bad hands, missing fingers, " "extra digit, fewer digits, cropped, worst quality, low quality" ) editor.infer( video="fps30.mp4", prompt=prompt, negative_prompt=negative_prompt, save_path="output2.mp4", strength=0.5, num_inference_steps=35, image_width=768, image_height=768, )