【Diffusers】「LTX Video」で動画作成(Text2Video)を行ってみる

PC環境

Windows 11
CUDA 12.4
Python 3.12

Python環境構築

pip install torch==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
pip install git+https://github.com/huggingface/diffusers
pip install transformers accelerate sentencepiece imageio imageio-ffmpeg

Pythonスクリプト

import torch
from diffusers import LTXPipeline
from diffusers.utils import export_to_video
from decorator import gpu_monitor, time_monitor

@gpu_monitor(interval=0.5)
@time_monitor
def main():
    pipe = LTXPipeline.from_pretrained(
        "Lightricks/LTX-Video",
        torch_dtype=torch.bfloat16
    )
    pipe.to("cuda")

    prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
    negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

    video = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        width=704,
        height=480,
        num_frames=161,
        num_inference_steps=50,
    ).frames[0]
    export_to_video(video, "output.mp4", fps=24)

if __name__ == "__main__":
    main()

結果

time: 64.84 sec
GPU 0 - Used memory: 23.77/23.99 GB

作成した動画は以下のGoogle Bloggerに載せています。
support-touchsp.blogspot.com

その他

ベンチマークはこちらで記述したスクリプトで行いました。
touch-sp.hatenablog.com