【Waifu Diffusion v1.4】【img2img】元画像（写真）を再現して新しい画像を生成する

公開日：2023年1月4日
最終更新日：2023年1月22日

はじめに
動作環境
方法
Pythonスクリプト

はじめに

「waifu-diffusion-v1-4」をいろいろ触ってみたのでその過程を記事にします。

左の元画像から右の画像を生成しました。
なるべく元画像に忠実にというのが今回のテーマです。

元画像はぱくたそから使わせて頂きました。
こちらの画像です。

動作環境

こちらのPCを使ってローカル環境でdiffusersからwaifu-diffusion-v1-4を実行しています。

Windows 11
CUDA 11.6.2
Python 3.10.9
Git for Windows 2.39.0

環境構築はpipのみで可能です。
waifu-diffusion以外も使用するためにこのようになっています。
waifu-diffusionには不要なものも入っているかもしれません。

pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install git+https://github.com/huggingface/diffusers.git
pip install git+https://github.com/huggingface/transformers.git
pip install accelerate==0.15.0 scipy==1.10.0
pip install xformers==0.0.16rc425
pip install safetensors==0.2.8

念のため動作確認できたバージョンをpinしておきます。

pip install -e git+https://github.com/huggingface/diffusers.git@8d326e61cfbe5d76e25deca6093ecb22967d634e#egg=diffusers
pip install -e git+https://github.com/huggingface/transformers.git@4e730b387364c9f46b6b1b0c79fdaf0903c42257#egg=transformers

モデルはgit lfsで一括ダウンロードしました。

git lfs install
git clone https://huggingface.co/hakurei/waifu-diffusion

方法

手順０

効率よく画像を生成するためにPythonスクリプトを書きました。
ブログの最後にのせておきます。以降の手順はそれを使用しています。

手順１（「prompt」について）

prompt（呪文）を決める必要があります。promptをいろいろ工夫して生成画像を調整する方法もありますが今回は以下のpromptに固定します。

prompt

masterpiece
best quality
high quality
absurdres
kawaii princess
white glowing skin
kawaii face with blush
blue eyes with eyelashes
sweater
turtleneck

negative prompt

worst quality
low quality
medium quality
deleted
lowres
comic
bad anatomy
bad hands
text
error
missing fingers
extra digit
fewer digits
cropped
jpeg artifacts
signature
watermark
username
blurry

手順２（「strength」について）

パラメーター「strength」についてみてみます。
元画像をどの程度変換するかを示すパラメーターです。値が大きいほどノイズが追加され元画像から離れていきます。
0 から 1 の間で指定し、デフォルトは0.8に設定されているようです。
いろいろ「strength」の値を変更して画像を出力させてみた結果がこのようになります。

python img2img.py --image lady.jpg --negative_prompt --strength 0.2 0.4 0.6 0.8

左から0.2→0.4→0.6→0.8と変えています。

デフォルトの0.8では全然違うポーズ、服装になってしまいました。

元画像を再現するという今回の目的では0.4あたりが妥当と考えました。
以降0.4に固定します。

手順３（「guidance_scale」について）

パラメーター「guidance_scale」についてみてみます。
promptをどの程度反映させるかを示すパラメーターです。値が大きいほどpromptに密接に関連する画像が生成されますが画質は低下してしまうようです。
1 以上を指定し、デフォルトは7.5に設定されているようです。
いろいろ「guidance_scale」の値を変更して画像を出力させてみた結果がこのようになります。

python img2img.py --image lady.jpg --negative_prompt --strength 0.4 --scale 3.5 5.5 7.5 9.5 11.5 13.5

左から 3.5→5.5→7.5→9.5→11.5→13.5と変えています。

高くすればするほど画像が乱れていきます。

3.5が妥当と考えました。
以降3.5に固定します。

手順４（「seed」をいろいろ変えてみる）

パラメーター「seed」を変えて良さそうなものを選択します。
今回は500から順に１ずつ増やして30枚の画像を作成して良さそうなものを選びました。

python img2img.py --image lady.jpg --negative_prompt --strength 0.4 --scale 3.5 --seed 500 --n_samples 30

501が一番良さそうだったので以降501に固定します。

手順５（「num_inference_steps」）

パラメーター「num_inference_steps」についてみてみます。
ノイズ除去を行うステップ数を示すパラメーターです。値が大きいほど画像の品質が高くなるようです。
デフォルトは50に設定されています。
パラメーター「strength」に影響を受けるようで実際のステップ数は
「num_inference_steps」×「strength」
で計算されます。
「strength」に小さい値を設定すると「num_inference_steps」は大きくする必要があります。
いろいろ「num_inference_steps」の値を変更して画像を出力させてみた結果がこのようになります。

python img2img.py --image lady.jpg --negative_prompt --strength 0.4 --scale 3.5 --seed 501 --steps 50 100 200 400

左から 50→100→200→400と変えています。

400stepsのものを最終生成画像としました。

その他

その他の設定としてスケジューラの変更などもあります。

python img2img.py --image lady.jpg --negative_prompt --strength 0.4 --scale 3.5 --seed 501 --steps 400 --scheduler multistepdpm
python img2img.py --image lady.jpg --negative_prompt --strength 0.4 --scale 3.5 --seed 501 --steps 400 --scheduler eulera

Pythonスクリプト

import os
import sys
import glob
import argparse
import datetime
import torch
from PIL import Image
from diffusers import StableDiffusionImg2ImgPipeline

parser = argparse.ArgumentParser()
parser.add_argument(
    '--seed',
    type=int,
    default=200,
    help='the seed (for reproducible sampling)',
)
parser.add_argument(
    '--n_samples',
    type=int,
    default=1,
    help='how many samples to produce for each given prompt',
)
parser.add_argument(
    '--scale',
    nargs='*',
    default=[7.5],    
    type=float,
    help='guidance_scale',
)
parser.add_argument(
    '--strength',
    nargs='*',
    default=[0.8],
    type=float,
    help='strength',
)
parser.add_argument(
    '--steps',
    nargs='*',
    default=[50],
    type=int,
    help='num_inference_steps',
)
parser.add_argument(
    '--negative_prompt',
    action="store_true",
    help='if enabled, use negative prompt',
)
parser.add_argument(
    '--image',
    type=str,
    help='original image'
)
parser.add_argument(
    '--scheduler',
    type=str,
    default='pndm',
    choices=['pndm', 'multistepdpm', 'eulera']
)
opt = parser.parse_args()

globresult = glob.glob('*')
dirlist =[]
for file_or_dir in globresult:
    if os.path.isdir(file_or_dir) and file_or_dir != 'results':
        dirlist.append(file_or_dir)

if len(dirlist) == 1:
    model_id = dirlist[0]
    print(f'model id: {model_id}')
else:
    print('Unable to identify model')
    sys.exit()

original_image = opt.image
init_image = Image.open(original_image).convert("RGB").resize((512, 512))

if os.path.isfile('prompt.txt'):
    print('reading prompts from prompt.txt')
    with open('prompt.txt', 'r') as f:
        #prompt = f.read().splitlines()
        prompt = f.readlines()
        prompt = [x.strip() for x in prompt]
        prompt = ','.join(prompt)
else:
    print('Unable to find prompt.txt')
    sys.exit()

if opt.negative_prompt and os.path.isfile('negative_prompt.txt'):
    print('reading negative prompts from negative_prompt.txt')
    with open('negative_prompt.txt', 'r') as f:
        #negative_prompt = f.read().splitlines()
        negative_prompt = f.readlines()
        negative_prompt = [x.strip() for x in negative_prompt]
        negative_prompt = ','.join(negative_prompt)
else:
    negative_prompt = None

print(f'prompt: {prompt}')
print(f'negative prompt: {negative_prompt}')

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.float32)
scheduler = opt.scheduler
match scheduler:
    case 'pmdn':
        from diffusers import  PNDMScheduler
        pipe.scheduler = PNDMScheduler.from_config(pipe.scheduler.config)
    case 'multistepdpm':
        from diffusers import DPMSolverMultistepScheduler
        pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
    case 'eulera':
        from diffusers import EulerAncestralDiscreteScheduler
        pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
    case _:
        None
pipe.to("cuda")

def null_safety(images, **kwargs):
    return images, False
 
pipe.safety_checker = null_safety

os.makedirs('results', exist_ok=True)

now = datetime.datetime.today()
now_str = now.strftime('%m%d_%H%M')

scale_list = opt.scale
strength_list = opt.strength
steps_list = opt.steps

for i in range(opt.n_samples):
    seed  = opt.seed + i
    for scale in scale_list:
        for strength in strength_list:
            for steps in steps_list:
                generator = torch.Generator(device="cuda").manual_seed(seed)
                image = pipe(
                    prompt = prompt,
                    negative_prompt = negative_prompt,
                    image = init_image,
                    generator = generator,
                    guidance_scale = scale,
                    strength = strength,
                    num_inference_steps = steps,
                    num_images_per_prompt = 1).images[0]
                image.save(os.path.join('results', f'{now_str}_{scheduler}_seed{seed}_scale{scale}_strength{strength}_steps{steps}.png'))