環境
Ubuntu 22.04 on WSL2 Python 3.10 CUDA 11.8
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118 pip install openmim==0.3.9 mim install mmcv==2.0.1 mim install mmdet[multimodal]==3.1.0
実行
リポジトリのクローン
git clone https://github.com/open-mmlab/mmdetection cd mmdetection
モデルのダウンロード
wget https://download.openmmlab.com/mmdetection/v3.0/xdecoder/xdecoder_focalt_last_novg.pt
テスト画像のダウンロード
wget https://raw.githubusercontent.com/SHI-Labs/Versatile-Diffusion/master/assets/demo/reg_example/boy_and_girl.jpg
スクリプトの記述
import numpy as np from argparse import ArgumentParser from mmdet.apis.det_inferencer import DetInferencer, InputsType, PredType from typing import Iterable, List, Optional, Tuple, Union class ImageCaptionInferencer(DetInferencer): def visualize(self, inputs: InputsType, preds: PredType, show: bool = False, wait_time: int = 0, draw_pred: bool = True, pred_score_thr: float = 0.3, **kwargs) -> Union[List[np.ndarray], None]: for pred in preds: print(pred.pred_caption) def parse_args(): parser = ArgumentParser() parser.add_argument('inputs', type=str, help='Input image file or folder path.') parser.add_argument('model', type=str, help='Config file name') parser.add_argument('--weights', type=str, help='Checkpoint file') parser.add_argument('--device', type=str, default='cuda:0', help='Device used for inference') call_args = vars(parser.parse_args()) init_kws = ['model', 'weights', 'device'] init_args = {} for init_kw in init_kws: init_args[init_kw] = call_args.pop(init_kw) init_args['palette'] = None return init_args, call_args def main(): init_args, call_args = parse_args() inferencer = ImageCaptionInferencer(**init_args) inferencer(**call_args) if __name__ == '__main__': main()
上記スクリプトを「demo.py」という名前で保存します。
スクリプトの実行
python demo.py \ boy_and_girl.jpg \ projects/XDecoder/configs/xdecoder-tiny_zeroshot_caption_coco2014.py \ --weights xdecoder_focalt_last_novg.pt
結果
children sitting on the ground and watching a starry sky
その他のImage Captioningの記事
touch-sp.hatenablog.comtouch-sp.hatenablog.com