【Video2Video】MMagic の Controlnet Animation で DW Openpose が使えるようにする

はじめに

MMagic の Controlnet Animation はデフォルトで「sd-controlnet-hed 」を使うようになっています。

DW Openposeが使えるようにスクリプトを一部改変しました。

結果

動画はGoogle Bloggerに載せています。
support-touchsp.blogspot.com

環境

Windows 11
CUDA 11.7
Python 3.10

Python環境構築

pip install torch==2.0.1+cu117 torchvision==0.15.2+cu117 --index-url https://download.pytorch.org/whl/cu117
pip install accelerate==0.23.0
pip install mmcv==2.0.1 -f https://download.openmmlab.com/mmcv/dist/cu117/torch2.0.0/index.html
pip install openmim==0.3.9
pip install mmdet==3.1.0

pip install -U setuptools wheel
pip install git+https://github.com/open-mmlab/mmpose@dev-1.x

git clone https://github.com/open-mmlab/mmagic
cd mmagic
pip install -e .

configファイルの書き換え

モデル定義に「_scope_='mmagic'」を追加する必要があります。（←はまりポイント）

SchedulerはDDIMSchedulerを指定すると結果が良くなる印象です。

# config for model
stable_diffusion_v15_url = 'pipeline/toonyou_beta6_ema'
controlnet_hed_url = 'controlnet/control_v11p_sd15_openpose'
#control_detector = 'lllyasviel/ControlNet'
control_scheduler = 'DDIMScheduler'

# method type : 'multi-frame rendering' or 'attention_injection'
inference_method = 'attention_injection'

model = dict(
    type='ControlStableDiffusionImg2Img',
    _scope_='mmagic',
    vae=dict(
        type='AutoencoderKL',
        from_pretrained=stable_diffusion_v15_url,
        subfolder='vae'),
    unet=dict(
        type='UNet2DConditionModel',
        subfolder='unet',
        from_pretrained=stable_diffusion_v15_url),
    text_encoder=dict(
        type='ClipWrapper',
        clip_type='huggingface',
        pretrained_model_name_or_path=stable_diffusion_v15_url,
        subfolder='text_encoder'),
    tokenizer=stable_diffusion_v15_url,
    controlnet=dict(
        type='ControlNetModel', from_pretrained=controlnet_hed_url),
    scheduler=dict(
        type='DDPMScheduler',
        from_pretrained=stable_diffusion_v15_url,
        subfolder='scheduler'),
    test_scheduler=dict(
        type='DDIMScheduler',
        from_pretrained=stable_diffusion_v15_url,
        subfolder='scheduler'),
    data_preprocessor=dict(type='DataPreprocessor'),
    init_cfg=dict(type='init_from_unet'),
    enable_xformers=False,
)

controlnet_animation_inferencer.py の変更

「mmagic/mmagic/apis/inferencers/controlnet_animation_inferencer.py」を変更します。

修正点は2点です。

#from controlnet_aux import HEDdetector
from controlnet_aux import DWposeDetector

#self.hed = HEDdetector.from_pretrained(cfg.control_detector)
self.hed = DWposeDetector(
    det_config="mmopenlab/yolox_l_8xb8-300e_coco.py",
    det_ckpt="mmopenlab/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth",
    pose_config="mmopenlab/dwpose-l_384x288.py",
    pose_ckpt="mmopenlab/dw-ll_ucoco_384.pth",
    device="cuda"
)

必要なファイルのダウンロード

以下から4つのファイルをダウンロードして「mmopenlab」フォルダに保存します。

https://github.com/patrickvonplaten/controlnet_aux/blob/master/src/controlnet_aux/dwpose/dwpose_config/dwpose-l_384x288.py
https://github.com/patrickvonplaten/controlnet_aux/blob/master/src/controlnet_aux/dwpose/yolox_config/yolox_l_8xb8-300e_coco.py
https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_l_8x8_300e_coco/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth
https://huggingface.co/wanghaofan/dw-ll_ucoco_384/resolve/main/dw-ll_ucoco_384.pth

実行ファイル

from mmagic.apis import MMagicInferencer

editor = MMagicInferencer(
    model_name="controlnet_animation",
    model_config="my_config.py"
)

prompt = "masterpiece, 1girl, dancing, best quality, extremely detailed"

negative_prompt = (
    "longbody, lowres, bad anatomy, bad hands, missing fingers, "
    "extra digit, fewer digits, cropped, worst quality, low quality"
)

editor.infer(
    video="fps30.mp4",
    prompt=prompt,
    negative_prompt=negative_prompt,
    save_path="output2.mp4",
    strength=0.5,
    num_inference_steps=35,
    image_width=768,
    image_height=768,
)

ランキング参加中

プログラミング