はじめに
こちらのstylegan3-editingを試してみました。github.com
InterFaceGANを使う方法とStyleCLIPを使う方法の2種類が紹介されています。
前回、前編としてInterFaceGANを使う方法を試しました。
touch-sp.hatenablog.com
今回はStyleCLIPを使う方法を試します。
単語(または文章)で写真を編集することが可能です。
環境構築
Ubuntu 20.04 on WSL2 CUDA 11.1.1 Python 3.8.10
あらかじめこちらをインストールしました。
sudo apt install cmake sudo apt install build-essential
Python環境構築
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html pip install scipy pip install matplotlib pip install tqdm pip install opencv-python pip install scikit-learn pip install imageio pip install dataclasses pip install pyrallis pip install gdown pip install dlib pip install ftfy pip install git+https://github.com/openai/CLIP.git
beautifulsoup4==4.11.1 certifi==2022.6.15 charset-normalizer==2.1.0 clip==1.0 cycler==0.11.0 dataclasses==0.6 dlib==19.24.0 filelock==3.7.1 fonttools==4.34.4 ftfy==6.1.1 gdown==4.5.1 idna==3.3 imageio==2.21.1 joblib==1.1.0 kiwisolver==1.4.4 matplotlib==3.5.2 mypy-extensions==0.4.3 numpy==1.23.1 opencv-python==4.6.0.66 packaging==21.3 Pillow==9.2.0 pkg_resources==0.0.0 pyparsing==3.0.9 pyrallis==0.3.1 PySocks==1.7.1 python-dateutil==2.8.2 PyYAML==6.0 regex==2022.7.25 requests==2.28.1 scikit-learn==1.1.2 scipy==1.9.0 six==1.16.0 soupsieve==2.3.2.post1 threadpoolctl==3.1.0 torch==1.10.0+cu111 torchvision==0.11.0+cu111 tqdm==4.64.0 typing-inspect==0.7.1 typing_extensions==4.3.0 urllib3==1.26.11 wcwidth==0.2.5
実行方法
リポジトリをクローンした後に「downloaded」フォルダを新規に作成します。そのフォルダ内に「shape_predictor_68_face_landmarks.dat」をダウンロードします。git clone https://github.com/yuval-alaluf/stylegan3-editing.git cd stylegan3-editing mkdir downloaded cd downloaded wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2 bzip2 -dk shape_predictor_68_face_landmarks.dat.bz2 cd ../
その後以下のファイルを実行するのみです。必要なファイルは自動でダウンロードされます。
import argparse import os import dlib import torch from gdown import download from torchvision import transforms from utils.alignment_utils import align_face, crop_face, get_stylegan_transform from utils.inference_utils import run_on_batch, load_encoder, get_average_image parser = argparse.ArgumentParser() parser.add_argument('--input', type=str, help='path of image file') parser.add_argument('--neutral_text', type = str) parser.add_argument('--target_text', type = str) parser.add_argument('--alpha', type = float, default=0, help='min:-5, max:5, step:0.5') parser.add_argument('--beta', type = float, default=0, help='min:-1, max:1, step:0.1') args = parser.parse_args() image_path = args.input neutral_text = args.neutral_text target_text = args.target_text alpha = args.alpha beta = args.beta model_path = os.path.join('downloaded', 'restyle_pSp_ffhq.pt') if not os.path.exists(model_path): download('https://drive.google.com/uc?id=12WZi2a9ORVg-j6d9x4eF-CKpLaURC2W-', model_path, quiet = False) download_files = [ ('delta_i_c.npy', '1HOUGvtumLFwjbwOZrTbIloAwBBzs2NBN'), ('s_stats', '1FVm_Eh7qmlykpnSBN1Iy533e_A2xM78z') ] os.makedirs('editing/styleclip_global_directions/sg3-r-ffhq-1024', exist_ok=True) for file_name, file_id in download_files: save_path = os.path.join('editing/styleclip_global_directions/sg3-r-ffhq-1024', file_name) if not os.path.exists(save_path): download(f'https://drive.google.com/uc?id={file_id}', save_path, quiet = False) transform_fn = transforms.Compose([ transforms.Resize((256, 256)), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]) predictor = dlib.shape_predictor('downloaded/shape_predictor_68_face_landmarks.dat') detector = dlib.get_frontal_face_detector() aligned_image = align_face(image_path, detector, predictor) cropped_image = crop_face(image_path, detector, predictor, random_shift=0) dir, fname = os.path.split(image_path) aligned_image_path = os.path.join(dir, 'aligned_' + fname) cropped_image_path = os.path.join(dir, 'cropped_' + fname) aligned_image.save(aligned_image_path) cropped_image.save(cropped_image_path) landmarks_transform = get_stylegan_transform(cropped_image_path, aligned_image_path, detector, predictor)[3] net, opts = load_encoder(model_path) opts.n_iters_per_batch = 3 opts.resize_outputs = False transformed_image = transform_fn(aligned_image) avg_image = get_average_image(net) with torch.no_grad(): result_batch, result_latents = run_on_batch(inputs=transformed_image.unsqueeze(0).cuda().float(), net=net, opts=opts, avg_image=avg_image, landmarks_transform=torch.from_numpy(landmarks_transform).cuda().float()) from editing.styleclip_global_directions import edit as styleclip_edit from utils.common import tensor2im styleclip_args = styleclip_edit.EditConfig() global_direction_calculator = styleclip_edit.load_direction_calculator(net.decoder, styleclip_args) opts = styleclip_edit.EditConfig() opts.alpha_min = alpha opts.alpha_max = alpha opts.num_alphas = 1 opts.beta_min = beta opts.beta_max = beta opts.num_betas = 1 opts.neutral_text = neutral_text opts.target_text = target_text input_latent = result_latents[0][-1] input_transforms = torch.from_numpy(landmarks_transform).cpu().numpy() edit_res, edit_latent = styleclip_edit.edit_image(latent=input_latent, landmarks_transform=input_transforms, stylegan_model=net.decoder, global_direction_calculator=global_direction_calculator, opts=opts, image_name=None, save=False) edited_im = tensor2im(edit_res[0]).resize((512, 512)) edited_im.save('result.jpg')
実行方法は以下のヘルプが参考になります。
optional arguments: -h, --help show this help message and exit --input INPUT path of image file --neutral_text NEUTRAL_TEXT --target_text TARGET_TEXT --alpha ALPHA min:-5, max:5, step:0.5 --beta BETA min:-1, max:1, step:0.1
例えばこのようにします。(スクリプトファイルの名前は「exe2.py」としています)
python exe2.py \ --input face.jpg \ --neutral_text "hair" \ --target_text "afro hair" \ --alpha 4 \ --beta 0.1
python exe2.py \ --input face.jpg \ --neutral_text "black hair" \ --target_text "red hair" \ --alpha 3.5 \ --beta 0.2
python exe2.py \ --input face.jpg \ --neutral_text "a face" \ --target_text "a laughing face" \ --alpha 3.5 \ --beta 0.2
補足
今回はフリー素材「ぱくたそ」から顔写真を使わせて頂きました。こちらの写真です。