Python環境構築
pip install torch==2.4.0+cu118 --index-url https://download.pytorch.org/whl/cu118 pip install git+https://github.com/huggingface/diffusers pip install git+https://github.com/huggingface/accelerate pip install transformers sentencepiece opencv-python
Pythonスクリプト
import torch from diffusers import CogVideoXPipeline, CogVideoXDDIMScheduler from diffusers.utils import export_to_video prompt = ( "A panda sits on a wooden stool in a serene bamboo forest." "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes." "Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays." "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance." ) pipe = CogVideoXPipeline.from_pretrained( "THUDM/CogVideoX-2b", torch_dtype=torch.float16 ) pipe.scheduler = CogVideoXDDIMScheduler.from_config( pipe.scheduler.config, timestep_spacing="trailing" ) pipe.enable_model_cpu_offload() pipe.vae.enable_tiling() video = pipe(prompt=prompt, num_frames=48, guidance_scale=6, num_inference_steps=50, generator=torch.Generator().manual_seed(42)).frames[0] export_to_video(video, "output_tiling.mp4", fps=8)
結果
結果はGoogle Bloggerに載せています。support-touchsp.blogspot.com
VRAM使用量は8GB以下でした。
追記(2025年1月16日)
「CogVideoX1.5-5B」の記事を書きました。touch-sp.hatenablog.com