【Diffusers】Ovis-Imageを使ってみる(他のモデルと比べてみました)

結果

Z Image TruboやChromaと比較してみました。

Ovis-Imageの結果

Z Image Turboの結果

こちらの記事を見て下さい。

Chromaの結果

こちらの記事を見て下さい。

Pythonスクリプト

import torch
from diffusers import OvisImagePipeline

pipe = OvisImagePipeline.from_pretrained(
    "AIDC-AI/Ovis-Image-7B",
    torch_dtype=torch.bfloat16
)

pipe.to("cuda")
pipe.enable_model_cpu_offload()

prompt = 'Ultra-realistic, high-quality photo of an anthropomorphic capybara with a tough, streetwise attitude, wearing a worn black leather jacket, dark sunglasses, and ripped jeans. The capybara is leaning casually against a gritty urban wall covered in vibrant graffiti. Behind it, in bold, dripping yellow spray paint, the word "HuggingFace" is scrawled in large street-art style letters. The scene is set in a dimly lit alleyway with moody lighting, scattered trash, and an edgy, rebellious vibe — like a character straight out of an underground comic book.'

image = pipe(
    prompt,
    negative_prompt="",
    num_inference_steps=50,
    guidance_scale=5.0
).images[0]

image.save("ovis_image.jpg")

環境構築

pyproject.tomlを載せておきます。 (バージョンはあえて固定しています)

uvを使うとuv syncだけで環境構築できると思います。

[project]
name = "ovis"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
    "accelerate==1.12.0",
    "diffusers==0.36.0",
    "hf-xet==1.2.0",
    "torch==2.9.1+cu126",
    "transformers==4.57.3",
]

[[tool.uv.index]]
name = "torch-cuda"
url = "https://download.pytorch.org/whl/cu126"
explicit = true

[tool.uv.sources]
torch = [{ index = "torch-cuda" }]