【AutoGluon】【画像分類】数行のコードで人と同等の精度が出せるのか？

公開日：2021年1月26日
最終更新日：2022年9月13日

AutoGluonは画像分類にMultiModalPredictorを推奨しています。
ImagePredictorからMultiModalPredictorに変更して記事を書き換えました。

はじめに

人が絶対に間違えないような簡単な画像分類で深層学習は100%の精度が出せるのか試してみました。

今回使用したデータ

miniJSRT_database | 日本放射線技術学会画像部会から「Classification > Directions01(128×128,RGB Color:24bit)」をダウンロードさせて頂きました。

胸部X線写真の向きを判定する問題です。「上向き」「下向き」「右向き」「左向き」の4クラス分類問題として解くことができます。

ダウンロードしたZIPファイルを解凍するとtrainフォルダとtestフォルダの2つのフォルダにそれぞれup, down. right, leftの4つのフォルダが入っています。

DIRECTIONS01_RGB
├─test
│  ├─down
│  ├─left
│  ├─right
│  └─up
└─train
    ├─down
    ├─left
    ├─right
    └─up

以下の2行で2つのPandasデータフレームが作成されます。

from autogluon.vision import ImagePredictor
train_dataset, _, test_dataset = ImagePredictor.Dataset.from_folders('DIRECTIONS01_RGB')

このPandasデータフレームを学習とテストに使用することになります。

>>> train_dataset
                                      image  label
0      D:\DIRECTIONS01_RGB\train\down\0.png      0
1      D:\DIRECTIONS01_RGB\train\down\1.png      0
2     D:\DIRECTIONS01_RGB\train\down\10.png      0
3    D:\DIRECTIONS01_RGB\train\down\100.png      0
4    D:\DIRECTIONS01_RGB\train\down\101.png      0
..                                      ...    ...
943     D:\DIRECTIONS01_RGB\train\up\95.png      3
944     D:\DIRECTIONS01_RGB\train\up\96.png      3
945     D:\DIRECTIONS01_RGB\train\up\97.png      3
946     D:\DIRECTIONS01_RGB\train\up\98.png      3
947     D:\DIRECTIONS01_RGB\train\up\99.png      3

[948 rows x 2 columns]

>>> test_dataset
                                    image  label
0     D:\DIRECTIONS01_RGB\test\down\1.png      0
1    D:\DIRECTIONS01_RGB\test\down\10.png      0
2     D:\DIRECTIONS01_RGB\test\down\2.png      0
3     D:\DIRECTIONS01_RGB\test\down\3.png      0
4     D:\DIRECTIONS01_RGB\test\down\4.png      0
..                                      ...    ...
35      D:\DIRECTIONS01_RGB\test\up\5.png      3
36      D:\DIRECTIONS01_RGB\test\up\6.png      3
37      D:\DIRECTIONS01_RGB\test\up\7.png      3
38      D:\DIRECTIONS01_RGB\test\up\8.png      3
39      D:\DIRECTIONS01_RGB\test\up\9.png      3

学習と検証

import warnings
warnings.filterwarnings('ignore')

from autogluon.vision import ImagePredictor
from autogluon.multimodal import MultiModalPredictor

train_dataset, _, test_dataset = ImagePredictor.Dataset.from_folders('Directions01_RGB')

predictor = MultiModalPredictor(label="label")
predictor.fit(train_data = train_dataset)

score = predictor.evaluate(test_dataset , metrics=["accuracy"])
print(score)

predictor.save('my_saved_dir')

上記を実行すると以下の結果が返ってきます。

Global seed set to 123
Auto select gpus: [0]
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name              | Type                            | Params
----------------------------------------------------------------------
0 | model             | TimmAutoModelForImagePrediction | 86.7 M
1 | validation_metric | Accuracy                        | 0
2 | loss_func         | CrossEntropyLoss                | 0
----------------------------------------------------------------------
86.7 M    Trainable params
0         Non-trainable params
86.7 M    Total params
173.495   Total estimated model params size (MB)
Epoch 0:  50%|██████████████████████████▊                           | 71/143 [00:04<00:04, 15.71it/s, loss=1.51, v_num=Epoch 0, global step 2: 'val_accuracy' reached 0.27895 (best 0.27895), saving model to '/mnt/wsl/PHYSICALDRIVE0p1/autogluon_works/AutogluonModels/ag-20220913_141522/epoch=0-step=2.ckpt' as top 3
Epoch 0:  99%|███████████████████████████████████████████████████▋| 142/143 [00:10<00:00, 13.93it/s, loss=0.707, v_num=Epoch 0, global step 5: 'val_accuracy' reached 0.76316 (best 0.76316), saving model to '/mnt/wsl/PHYSICALDRIVE0p1/autogluon_works/AutogluonModels/ag-20220913_141522/epoch=0-step=5.ckpt' as top 3
Epoch 1:  50%|██████████████████████████▎                          | 71/143 [00:04<00:04, 15.51it/s, loss=0.332, v_num=Epoch 1, global step 8: 'val_accuracy' reached 0.89474 (best 0.89474), saving model to '/mnt/wsl/PHYSICALDRIVE0p1/autogluon_works/AutogluonModels/ag-20220913_141522/epoch=1-step=8.ckpt' as top 3
Epoch 1:  99%|███████████████████████████████████████████████████▋| 142/143 [00:13<00:00, 10.17it/s, loss=0.144, v_num=Epoch 1, global step 11: 'val_accuracy' reached 1.00000 (best 1.00000), saving model to '/mnt/wsl/PHYSICALDRIVE0p1/autogluon_works/AutogluonModels/ag-20220913_141522/epoch=1-step=11.ckpt' as top 3
Epoch 2:  50%|██████████████████████████▎                          | 71/143 [00:04<00:04, 15.78it/s, loss=0.125, v_num=Epoch 2, global step 14: 'val_accuracy' reached 1.00000 (best 1.00000), saving model to '/mnt/wsl/PHYSICALDRIVE0p1/autogluon_works/AutogluonModels/ag-20220913_141522/epoch=2-step=14.ckpt' as top 3
Epoch 2:  99%|██████████████████████████████████████████████████▋| 142/143 [00:11<00:00, 12.90it/s, loss=0.0188, v_num=Epoch 2, global step 17: 'val_accuracy' reached 1.00000 (best 1.00000), saving model to '/mnt/wsl/PHYSICALDRIVE0p1/autogluon_works/AutogluonModels/ag-20220913_141522/epoch=2-step=17.ckpt' as top 3
Epoch 3:  50%|█████████████████████████▊                          | 71/143 [00:06<00:06, 11.01it/s, loss=0.0958, v_num=Epoch 3, global step 20: 'val_accuracy' was not in top 3
Epoch 3:  99%|█████████████████████████████████████████████████▋| 142/143 [00:11<00:00, 11.96it/s, loss=0.00292, v_num=Epoch 3, global step 23: 'val_accuracy' was not in top 3
Epoch 4:  50%|█████████████████████████▊                          | 71/143 [00:04<00:04, 17.23it/s, loss=0.0086, v_num=Epoch 4, global step 26: 'val_accuracy' was not in top 3
Epoch 4:  99%|██████████████████████████████████████████████████▋| 142/143 [00:09<00:00, 15.00it/s, loss=0.0241, v_num=Epoch 4, global step 29: 'val_accuracy' was not in top 3
Epoch 5:  50%|██████████████████████████▊                           | 71/143 [00:04<00:04, 17.12it/s, loss=0.02, v_num=Epoch 5, global step 32: 'val_accuracy' was not in top 3
Epoch 5:  99%|█████████████████████████████████████████████████▋| 142/143 [00:09<00:00, 14.77it/s, loss=0.00628, v_num=Epoch 5, global step 35: 'val_accuracy' was not in top 3
Epoch 6:  50%|█████████████████████████▎                         | 71/143 [00:04<00:04, 17.20it/s, loss=0.00142, v_num=Epoch 6, global step 38: 'val_accuracy' was not in top 3
Epoch 6:  99%|████████████████████████████████████████████████▋| 142/143 [00:09<00:00, 14.75it/s, loss=0.000366, v_num=Epoch 6, global step 41: 'val_accuracy' was not in top 3
Epoch 6:  99%|████████████████████████████████████████████████▋| 142/143 [00:10<00:00, 13.22it/s, loss=0.000366, v_num=]
Start to fuse 3 checkpoints via the greedy soup algorithm.
Predicting DataLoader 0: 100%|████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 12.04it/s]
Predicting DataLoader 0: 100%|████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 15.46it/s]
Predicting DataLoader 0: 100%|████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 12.22it/s]
Predicting DataLoader 0: 100%|████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 16.62it/s]
{'accuracy': 1.0}

数分で学習が終わり、テストデータに対する正解率が100%になっています。

もう少し細かく見てみましょう

どのくらいの確率で正解を言い当てているかすべてのテストデータについて見てみました。

import warnings
warnings.filterwarnings('ignore')

from autogluon.vision import ImagePredictor
from autogluon.multimodal import MultiModalPredictor

_, _, test_dataset = ImagePredictor.Dataset.from_folders('Directions01_RGB')

predictor = MultiModalPredictor.load('my_saved_dir')

proba = predictor.predict_proba(test_dataset)
proba['label'] = test_dataset['label']

folder_names = ['down', 'left', 'right', 'up']

for i in range(4):
    print(folder_names[i], 'images:')
    print(proba[proba['label']==i][i])

Load pretrained checkpoint: my_saved_dir/model.ckpt
Predicting DataLoader 0: 100%|████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.06it/s]
down images:
0    0.999938
1    0.999947
2    0.999744
3    0.999950
4    0.999883
5    0.999847
6    0.999931
7    0.999984
8    0.999946
9    0.999917
Name: 0, dtype: float32
left images:
10    0.999397
11    0.999616
12    0.996307
13    0.984800
14    0.997568
15    0.996720
16    0.999718
17    0.999014
18    0.998500
19    0.999051
Name: 1, dtype: float32
right images:
20    0.999887
21    0.999670
22    0.999889
23    0.999903
24    0.999819
25    0.999073
26    0.999945
27    0.999656
28    0.999897
29    0.999908
Name: 2, dtype: float32
up images:
30    0.999922
31    0.999998
32    0.999994
33    0.999977
34    0.999996
35    0.999992
36    0.999995
37    0.999996
38    0.999994
39    0.999955
Name: 3, dtype: float32

1枚（98.5%）を除きすべての画像に対して99%以上の確率で正解にたどり着いていることがわかります。