Pythonスクリプト
たったこれだけです。学習済みモデルは自動的にダウンロードされるため事前準備不要です。from transformers import Blip2Processor, Blip2ForConditionalGeneration from diffusers.utils import load_image import torch processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b") model = Blip2ForConditionalGeneration.from_pretrained( "Salesforce/blip2-opt-2.7b", torch_dtype=torch.float16 ).to("cuda") image = load_image("https://github.com/SHI-Labs/Versatile-Diffusion/blob/master/assets/demo/reg_example/boy_and_girl.jpg?raw=true") inputs = processor(images=image, return_tensors="pt").to("cuda", torch.float16) generated_ids = model.generate(**inputs) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip() print(generated_text)
画像
結果
two children looking at a star in the sky
環境
Windows 11 CUDA 11.7 Python 3.10
pip install torch==2.0.1+cu117 --index-url https://download.pytorch.org/whl/cu117 pip install diffusers transformers accelerate