【AnimateDiff】Motion Module v3 と SparseCtrl で Image2Video を試してみる

元画像

用意した1枚の画像に動きを持たせることが目的です。

以下のスクリプトで作成しました。

from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler
import torch
from compel import Compel, DiffusersTextualInversionManager

pipe = StableDiffusionPipeline.from_single_file(
    "safetensors/yayoiMix_v25.safetensors",
    load_safety_checker=False,
    extract_ema=True,
    torch_dtype=torch.float16
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.load_textual_inversion("embeddings/easynegative.safetensors", token="EasyNegative")
pipe.to("cuda")

prompt = "closeup face photo of japanese woman in black clothes, night city street, bokeh, fireworks in background"
negative_prompt = "EasyNegative, (Worst Quality)++, (low quality)+"

textual_inversion_manager = DiffusersTextualInversionManager(pipe)
compel_proc = Compel(
    tokenizer=pipe.tokenizer,
    text_encoder=pipe.text_encoder,
    textual_inversion_manager=textual_inversion_manager,
    truncate_long_prompts=False)

prompt_embeds = compel_proc([prompt])
negative_prompt_embeds = compel_proc([negative_prompt])

seed = 2000000
generator = torch.manual_seed(seed)
image = pipe(
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds = negative_prompt_embeds,
    generator=generator,
    num_inference_steps=25,
    width=768,
    height=768,
).images[0]
image.save("result.png")

YAMLファイル

以下のようなYAMLファイルを用意しました。

# animation-1
- domain_lora_scale: 1.0
  adapter_lora_path: "models/Motion_Module/v3_sd15_adapter.ckpt"
  dreambooth_path:   "models/DreamBooth_LoRA/realisticVisionV51_v51VAE.safetensors"

  inference_config: "configs/inference/inference-v3.yaml"
  motion_module:    "models/Motion_Module/v3_sd15_mm.ckpt"

  controlnet_config: "configs/inference/sparsectrl/latent_condition.yaml"
  controlnet_path:   "models/SparseCtrl/v3_sd15_sparsectrl_rgb.ckpt"

  seed: 13781920133800
  steps: 35
  guidance_scale: 8.5

  controlnet_image_indexs: [0]
  controlnet_images:
    - "__assets__/result.png"

  prompt:
    - "closeup face photo of japanese woman in black clothes, night city street, bokeh, fireworks in background"
  n_prompt:
    - "worst quality, low quality, letterboxed"

実行

python -m scripts.animate --config mysettings.yaml

結果

ランキング参加中

プログラミング