リクルート社が発表した「japanese-clip-vit-b-32-roberta-base」を使って日本語でゼロショット画像分類を行う

1年以上前にOpenAIのCLIPを使ってゼロショット画像分類に挑戦した経験があります。
touch-sp.hatenablog.com
今回リクルート社が発表した「japanese-clip-vit-b-32-roberta-base」を使って日本語でゼロショット画像分類に挑戦しました。
huggingface.co

用意した画像

題材にしたセグウェイの写真は以前と同じです。

結果

上の画像に対して「車」「自転車」「バイク」「セグウェイ」のどれですかと質問した時の答えです。

これはセグウェイです
車: 0.16078947484493256
自転車: 0.238014355301857
バイク: 0.2504829466342926
セグウェイ: 0.43661999702453613

Pythonスクリプト

import torch
import numpy as np
from transformers import AutoTokenizer, AutoModel, CLIPImageProcessor
from diffusers.utils import load_image

device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "recruit-jp/japanese-clip-vit-b-32-roberta-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True).to(device)

image_processor = CLIPImageProcessor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K")

image = load_image("https://live.staticflickr.com/7236/7114602897_9cf00b2820_b.jpg")
image = image_processor(image, return_tensors="pt").pixel_values.to(device)

texts=["車", "自転車", "バイク", "セグウェイ"]
text = tokenizer(
    text=["[CLS]" + text for text in texts],
    max_length=77,
    padding="max_length",
    truncation=True,
    add_special_tokens=False,
    return_tensors="pt"
).input_ids.to(device)

with torch.inference_mode():
    image_features = model.get_image_features(image)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features = model.get_text_features(input_ids=text)
    text_features /= text_features.norm(dim=-1, keepdim=True)
    probs = (image_features @ text_features.T).cpu().numpy()

print(f"これは{texts[np.argmax(probs)]}です")

for i, text in enumerate(texts):
    print(f"{text}: {probs[0][i]}")

PC環境

Windows 11
Python 3.11

GPUがなくても実行可能です。

Python環境構築

CUDA 11.8を使った場合です。
Diffusersの「load_image」を使うためだけにDiffusersをインストールしています。

pip install torch==2.1.2+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install diffusers[torch]
pip install transformers protobuf sentencepiece

参考にさせて頂いたサイト

ayousanz.hatenadiary.jp

ランキング参加中

プログラミング