はじめに
CogVideoXの記事を以前書きました。Diffusersを使うとその時とほとんど同じスクリプトで「Mochi 1 Preview」も実行可能です。touch-sp.hatenablog.com
使用したPC
OS Windows 11 プロセッサ Core(TM) i7-14700K 実装 RAM 96.0 GB GPU RTX 4090 (VRAM 24GB)
CUDA 12.4 Python 3.12
Python環境構築
pip install torch==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124 pip install diffusers[torch] pip install transformers sentencepiece imageio imageio-ffmpeg pip install torchao
diffusers==0.32.1 imageio==2.36.1 imageio-ffmpeg==0.6.0 sentencepiece==0.2.0 torch==2.5.1+cu124 torchao==0.8.0 transformers==4.48.0
Pythonスクリプト
import torch from diffusers import MochiPipeline, AutoencoderKLMochi, MochiTransformer3DModel, TorchAoConfig from diffusers.utils import export_to_video from decorator import gpu_monitor, time_monitor import gc def flush(): gc.collect() torch.cuda.empty_cache() torch.cuda.reset_peak_memory_stats() @gpu_monitor(interval=0.5) @time_monitor def main(): pipe = MochiPipeline.from_pretrained( "genmo/mochi-1-preview", transformer=None, vae=None, torch_dtype=torch.bfloat16 ) pipe.enable_model_cpu_offload() prompt = "An aerial shot of a parade of elephants walking across the African savannah. The camera showcases the herd and the surrounding landscape." with torch.no_grad(): prompt_embeds, prompt_attention_mask, negative_prompt_embeds, negative_prompt_attention_mask = ( pipe.encode_prompt(prompt=prompt) ) print("text_encoder:") print(f"torch.cuda.max_memory_allocated: {torch.cuda.max_memory_allocated()/ 1024**3:.2f} GB") del pipe flush() quantization_config = TorchAoConfig("int8wo") transformer = MochiTransformer3DModel.from_pretrained( "genmo/mochi-1-preview", variant="bf16", subfolder="transformer", quantization_config=quantization_config, torch_dtype=torch.bfloat16 ) vae = AutoencoderKLMochi.from_pretrained( "genmo/mochi-1-preview", variant="bf16", subfolder="vae", quantization_config=quantization_config, torch_dtype=torch.bfloat16 ) pipe = MochiPipeline.from_pretrained( "genmo/mochi-1-preview", transformer=transformer, vae=vae, text_encoder=None, tokenizer=None, torch_dtype=torch.bfloat16 ) pipe.enable_model_cpu_offload() pipe.enable_vae_tiling() frames = pipe( prompt_embeds=prompt_embeds, prompt_attention_mask=prompt_attention_mask, negative_prompt_embeds=negative_prompt_embeds, negative_prompt_attention_mask=negative_prompt_attention_mask, num_frames=85, num_inference_steps=64, generator=torch.Generator().manual_seed(42) ).frames[0] export_to_video(frames, "mochi.mp4", fps=30) print("transformer and vae:") print(f"torch.cuda.max_memory_allocated: {torch.cuda.max_memory_allocated()/ 1024**3:.2f} GB") if __name__ == "__main__": main()
結果
このような二つのエラーが出ましたが、無視しても実行には影響ないようです。import error: No module named 'triton'
Expected types for vae: ['AutoencoderKL'], got AutoencoderKLMochi.
text_encoder: torch.cuda.max_memory_allocated: 8.94 GB transformer and vae: torch.cuda.max_memory_allocated: 13.99 GB time: 916.85 sec GPU 0 - Used memory: 19.37/23.99 GB
作成された動画はGoogle Bloggerに載せています。
support-touchsp.blogspot.com
その他
ベンチマークはこちらで記述したスクリプトで行いました。touch-sp.hatenablog.com