【Diffusers】「LTX Video」が0.9.1にバージョンアップしたので動画作成（Text2Video）を行ってみる

はじめに

Version 0.9.0の記事はこちらです。
touch-sp.hatenablog.com
今回はVersion 0.9.1を使います。

PC環境

Windows 11
RTX 4090 (VRAM 24GB)
CUDA 12.4
Python 3.12

Python環境構築

pip install torch==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
pip install diffusers[torch]
pip install transformers sentencepiece imageio imageio-ffmpeg

diffusers==0.32.1
imageio==2.36.1
imageio-ffmpeg==0.5.1
sentencepiece==0.2.0
torch==2.5.1+cu124
transformers==4.47.1

Pythonスクリプト

import torch
from diffusers import LTXPipeline
from diffusers.utils import export_to_video
from decorator import gpu_monitor, time_monitor

@gpu_monitor(interval=0.5)
@time_monitor
def main():
    pipe = LTXPipeline.from_pretrained(
        "a-r-r-o-w/LTX-Video-0.9.1-diffusers",
        torch_dtype=torch.bfloat16
    )
    pipe.to("cuda")

    prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
    negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

    video = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        width=704,
        height=480,
        num_frames=161,
        num_inference_steps=50,
        decode_timestep=0.03,
        decode_noise_scale=0.025
    ).frames[0]
    export_to_video(video, "output.mp4", fps=24)

    print(f"torch.cuda.max_memory_allocated: {torch.cuda.max_memory_allocated()/ 1024**3:.2f} GB")

if __name__ == "__main__":
    main()

結果

torch.cuda.max_memory_allocated: 19.36 GB
time: 72.42 sec
GPU 0 - Used memory: 23.88/23.99 GB

作成した動画は以下のGoogle Bloggerに載せています。
support-touchsp.blogspot.com

補足

「enable_model_cpu_offload()」と「enable_sequential_cpu_offload()」を使った場合の結果も載せておきます。

enable_model_cpu_offload()

torch.cuda.max_memory_allocated: 8.90 GB
time: 54.13 sec
GPU 0 - Used memory: 13.29/23.99 GB

enable_sequential_cpu_offload()

torch.cuda.max_memory_allocated: 5.13 GB
time: 117.10 sec
GPU 0 - Used memory: 7.95/23.99 GB

その他

ベンチマークはこちらで記述したスクリプトで行いました。
touch-sp.hatenablog.com

ランキング参加中

プログラミング