MNISTを単純なLSTMで解く(MXNet)改

同様の記事を書いたがMNISTのデータ取得はMXNetを使えば簡単だった。
touch-sp.hatenablog.com

環境

Windows 10 Pro
GPUなし
Python 3.6.6(venv使用)

astroid==2.0.4
certifi==2018.8.24
chardet==3.0.4
colorama==0.3.9
graphviz==0.8.4
idna==2.6
isort==4.3.4
lazy-object-proxy==1.3.1
mccabe==0.6.1
mxnet==1.3.0
numpy==1.14.6
Pillow==5.2.0
pylint==2.1.1
requests==2.18.4
six==1.11.0
typed-ast==1.1.0
urllib3==1.22
wrapt==1.10.11

(注)Visual Studio Codeを使用するためpylintがインストールされている

モデル(Model.py)

import mxnet as mx
from mxnet.gluon import Block, nn, rnn

class Model(Block):
    def __init__(self, **kwargs):
        super(Model, self).__init__(**kwargs)
        with self.name_scope():
            self.lstm = rnn.LSTM(128)
            self.dense = nn.Dense(10)
            
    def forward(self, x):
        input = x.swapaxes(0,1)
        out = self.lstm(input)
        out = out.swapaxes(0,1)
        out = self.dense(out)
        return out

(注)GluonではLSTM層の入出力のレイアウトはデフォルトで 'TNC' のためswapaxesが必要。
T: sequence length
N: batch size
C: feature dimensions
mxnet.incubator.apache.org

実行

import mxnet as mx
from mxnet import gluon, autograd
import numpy as np

mnist = mx.test_utils.get_mnist()
x_train = mnist['train_data']
t_train = mnist['train_label']
x_test = mnist['test_data']
t_test = mnist['test_label']

x_train = x_train.reshape(-1,28,28)
x_test = x_test.reshape(-1,28,28)

x_train = mx.nd.array(x_train)
t_train = mx.nd.array(t_train)
x_test = mx.nd.array(x_test)
t_test = mx.nd.array(t_test)

'''
#画像表示
from PIL import Image
img = Image.fromarray(x_train[0]*255)
img.show()
'''

import Model
model = Model.Model()
model.initialize(mx.init.Xavier())
loss_func = gluon.loss.SoftmaxCrossEntropyLoss()
trainer = gluon.Trainer(model.collect_params(), 'adam')

def evaluate_accuracy(input, label, net):
    acc = mx.metric.Accuracy()    
    output = net(input)
    predictions = mx.nd.argmax(output, axis=1)
    acc.update(preds=predictions, labels=label)
    return acc.get()[1]

print('start training...')
batch_size = 100
epochs = 10
loss_n = [] #ログ表示用

for epoch in range(1, epochs + 1):
    #ランダムに並べ替えたインデックスを作成
    indexs = np.random.permutation(x_train.shape[0])
    cur_start = 0
    while cur_start < x_train.shape[0]:
        cur_end = (cur_start + batch_size) if (cur_start + batch_size) < x_train.shape[0] else x_train.shape[0]
        data = x_train[indexs[cur_start:cur_end]]
        label = t_train[indexs[cur_start:cur_end]]
        #ニューラルネットワークの順伝播
        with autograd.record():
            output = model(data)
            #損失を求める
            loss = loss_func(output, label)
            #ログ表示用に損失の値を保存
            loss_n.append(np.mean(loss.asnumpy()))
        #損失の値から逆伝播する
        loss.backward()
        #学習ステータスをデータサイズ分進める
        trainer.step(data.shape[0])
        cur_start = cur_end
    #ログを表示
    ll = np.mean(loss_n)
    test_acc = evaluate_accuracy(x_test, t_test, model)
    train_acc = evaluate_accuracy(x_train, t_train, model)
    print('%d epoch loss = %f train_acc = %f test_acc = %f' %(epoch, ll, train_acc, test_acc))
    loss_n = []

model.save_parameters('lstm.params')