Stable Diffusion を開発している Stability AI が「Stable Video Diffusion」という image-to-videoモデルを公開しました

はじめに

新しく公開された「Stable Video Diffusion」は image-to-videoモデルです。

画像を一枚用意したらそれを動画化してくれます。

環境

二つの環境で動作確認しました。

Windows 11

Windows 11
NVIDIA Drive 546.01
CUDA 11.8
Python 3.10

Python環境構築は1行で済みます。

pip install -r https://raw.githubusercontent.com/dai-ichiro/myEnvironments/main/StableVideoDiffusion/requirements_win.txt

Ubuntu 22.04

Ubuntu 22.04 
NVIDIA Driver 545.29.02
CUDA 11.8
Python 3.10

Python環境構築は1行で済みます。

pip install -r https://raw.githubusercontent.com/dai-ichiro/myEnvironments/main/StableVideoDiffusion/requirements.txt

事前準備

リポジトリのクローン

git clone https://github.com/Stability-AI/generative-models

学習済みパラメーターのダウンロード

こちらから「svd.safetensors」をダウンロードして「generative-models/checkpoints」フォルダに配置します。「checkpoints」は新規に作成する必要があります。

実行

以下の1行だけです。

python scripts/sampling/simple_video_sample.py

OOM(out of memory)が出るようなら「simple_video_sample.py」の31行目の「decoding_t: int」の値を低く設定する必要があります。デフォルトは14になっていますが自分は3まで下げる必要がありました。

自前の画像を使いたければ「simple_video_sample.py」の23行目の「input_path: str」を変更します。

作成する動画の解像度は用意した元画像の解像度になるようです。

以下の警告のようなものが出ますが無視して良さそうです。

OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'

ランキング参加中

プログラミング