はじめに
ConsisIDは人間のアイデンティティを保つことを目的として開発された動画生成モデルです。IP-Adapter FaceIDの動画版みたいなものと勝手に認識しています。touch-sp.hatenablog.com
使用したPC
プロセッサ Intel(R) Core(TM) i7-14700K 実装 RAM 96.0 GB GPU RTX 4090 (VRAM 24GB)
Ubuntu 24.04 on WSL2 Python 3.12 CUDA 12.4
Python環境構築
onnxruntime-gpuだけバージョンを指定しました。pip install torch==2.5.1+cu124 torchvision==0.20.1+cu124 xformers --index-url https://download.pytorch.org/whl/cu124 pip install git+https://github.com/huggingface/diffusers pip install accelerate transformers sentencepiece pip install onnxruntime-gpu==1.19.2 insightface pip install consisid_eva_clip pip install imageio-ffmpeg
apexはpipでインストールできなかったのでソースからビルドしました。
git clone https://github.com/NVIDIA/apex cd apex pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
この記事の最後にライブラリのリストを書いておきます。
Pythonスクリプト
import torch from diffusers import ConsisIDPipeline from diffusers.pipelines.consisid.consisid_utils import prepare_face_models, process_face_embeddings_infer from diffusers.utils import export_to_video from decorator import gpu_monitor, time_monitor, print_memory # model was downloaded from https://huggingface.co/BestWishYsh/ConsisID-preview @gpu_monitor(interval=0.5) @time_monitor def main(): face_helper_1, face_helper_2, face_clip_model, face_main_model, eva_transform_mean, eva_transform_std = prepare_face_models( "BestWishYsh/ConsisID-preview", device="cuda", dtype=torch.bfloat16 ) pipe = ConsisIDPipeline.from_pretrained( "BestWishYsh/ConsisID-preview", torch_dtype=torch.bfloat16 ) pipe.enable_model_cpu_offload() #pipe.enable_sequential_cpu_offload() #pipe.vae.enable_tiling() prompt = "The video captures a boy walking along a city street, filmed in black and white on a classic 35mm camera. His expression is thoughtful, his brow slightly furrowed as if he's lost in contemplation. The film grain adds a textured, timeless quality to the image, evoking a sense of nostalgia. Around him, the cityscape is filled with vintage buildings, cobblestone sidewalks, and softly blurred figures passing by, their outlines faint and indistinct. Streetlights cast a gentle glow, while shadows play across the boy's path, adding depth to the scene. The lighting highlights the boy's subtle smile, hinting at a fleeting moment of curiosity. The overall cinematic atmosphere, complete with classic film still aesthetics and dramatic contrasts, gives the scene an evocative and introspective feel." image = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_input.png?download=true" id_cond, id_vit_hidden, image, face_kps = process_face_embeddings_infer( face_helper_1, face_clip_model, face_helper_2, eva_transform_mean, eva_transform_std, face_main_model, "cuda", torch.bfloat16, image, is_align_face=True, ) video = pipe( image=image, prompt=prompt, num_inference_steps=50, guidance_scale=6.0, use_dynamic_cfg=False, id_vit_hidden=id_vit_hidden, id_cond=id_cond, kps_cond=face_kps, generator=torch.Generator("cuda").manual_seed(42), ) export_to_video(video.frames[0], "output.mp4", fps=8) print_memory() if __name__ == "__main__": main()
結果
作成動画はGoogle Bloggerに載せておきます。support-touchsp.blogspot.com
様々は方法で実行した結果を残しておきます。
「vae.enable_slicing()」はほとんどメモリ使用量削減に影響しませんでした。
方法1
pipe.enable_model_cpu_offload() #pipe.enable_sequential_cpu_offload() #pipe.vae.enable_tiling()
max_memory=16.57 GB max_reserved=25.87 GB time: 447.39 sec GPU 0 - Used memory: 23.85/23.99 GB
方法2
pipe.enable_model_cpu_offload()
#pipe.enable_sequential_cpu_offload()
pipe.vae.enable_tiling()
max_memory=16.04 GB max_reserved=18.29 GB time: 319.91 sec GPU 0 - Used memory: 21.48/23.99 GB
方法3
#pipe.enable_model_cpu_offload() pipe.enable_sequential_cpu_offload() #pipe.vae.enable_tiling()
max_memory=16.27 GB max_reserved=23.21 GB time: 637.43 sec GPU 0 - Used memory: 23.95/23.99 GB
方法4
#pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_tiling()
max_memory=5.29 GB max_reserved=7.11 GB time: 666.22 sec GPU 0 - Used memory: 8.58/23.99 GB
続き
続きの記事はこちら。touch-sp.hatenablog.com
ベンチマーク
ベンチマークはこちらのスクリプトを使用しました。touch-sp.hatenablog.com
パッケージのバージョン
accelerate==1.3.0 albucore==0.0.23 albumentations==2.0.0 annotated-types==0.7.0 apex @ file:///mnt/wsl/PHYSICALDRIVE3p1/consisid/env/apex asttokens==3.0.0 certifi==2024.12.14 charset-normalizer==3.4.1 coloredlogs==15.0.1 comm==0.2.2 consisid_eva_clip==1.0.2 contourpy==1.3.1 cycler==0.12.1 Cython==3.0.11 decorator==5.1.1 diffusers @ git+https://github.com/huggingface/diffusers@328e0d20a7b996f9bdb04180457eb08c1b42a76e easydict==1.13 einops==0.8.0 executing==2.1.0 facexlib==0.3.0 filelock==3.13.1 filterpy==1.4.5 flatbuffers==24.12.23 fonttools==4.55.3 fsspec==2024.2.0 ftfy==6.3.1 huggingface-hub==0.27.1 humanfriendly==10.0 idna==3.10 imageio==2.37.0 imageio-ffmpeg==0.6.0 importlib_metadata==8.5.0 insightface==0.7.3 ipython==8.31.0 ipywidgets==8.1.5 jedi==0.19.2 Jinja2==3.1.3 joblib==1.4.2 jupyterlab_widgets==3.0.13 kiwisolver==1.4.8 lazy_loader==0.4 llvmlite==0.43.0 MarkupSafe==2.1.5 matplotlib==3.10.0 matplotlib-inline==0.1.7 mpmath==1.3.0 networkx==3.2.1 numba==0.60.0 numpy==2.0.2 nvidia-cublas-cu12==12.4.5.8 nvidia-cuda-cupti-cu12==12.4.127 nvidia-cuda-nvrtc-cu12==12.4.127 nvidia-cuda-runtime-cu12==12.4.127 nvidia-cudnn-cu12==9.1.0.70 nvidia-cufft-cu12==11.2.1.3 nvidia-curand-cu12==10.3.5.147 nvidia-cusolver-cu12==11.6.1.9 nvidia-cusparse-cu12==12.3.1.170 nvidia-nccl-cu12==2.21.5 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu12==12.4.127 onnx==1.17.0 onnxruntime-gpu==1.19.2 opencv-python==4.11.0.86 opencv-python-headless==4.11.0.86 packaging==24.2 parso==0.8.4 pexpect==4.9.0 pillow==10.2.0 prettytable==3.12.0 prompt_toolkit==3.0.49 protobuf==5.29.3 psutil==6.1.1 ptyprocess==0.7.0 pure_eval==0.2.3 pydantic==2.10.5 pydantic_core==2.27.2 pyfacer==0.0.5 Pygments==2.19.1 pyparsing==3.2.1 python-dateutil==2.9.0.post0 PyYAML==6.0.2 regex==2024.11.6 requests==2.32.3 safetensors==0.5.2 scikit-image==0.25.0 scikit-learn==1.6.1 scipy==1.15.1 sentencepiece==0.2.0 setuptools==75.8.0 simsimd==6.2.1 six==1.17.0 stack-data==0.6.3 stringzilla==3.11.3 sympy==1.13.1 threadpoolctl==3.5.0 tifffile==2025.1.10 timm==1.0.14 tokenizers==0.21.0 torch==2.5.1+cu124 torchvision==0.20.1+cu124 tqdm==4.67.1 traitlets==5.14.3 transformers==4.48.0 triton==3.1.0 typing_extensions==4.12.2 urllib3==2.3.0 validators==0.34.0 wcwidth==0.2.13 wheel==0.45.1 widgetsnbextension==4.0.13 xformers==0.0.29.post1 zipp==3.21.0