Deep learning tutorial - プログラマ英語学習日記

Recently, I began to learn 'Deep Learning'. (That's why I started to study English!)

But I think the tutorial in Tensorflow Official page is not friendly.

They use MNIST, very popular example for machine learning. However, when I was doing this tutorial I was not able to understand what type of data I need to create for input and output.

They use files that is packaged in the framework. It prevented me from understanding data structure. I couldn't make my own network and data after this tutorial. (I suppose this might happen in all beginners)

So I decided to write an article about how to create simple neural network in this article. I'm only a beginner, but writing this article will be a very good learning for me.

It's also a great challege for me to write this in English!!!

In this article, I'll show the way to create a easy neural network using Keras.

Notice: this article doesn't explain about * Basic Python Programming * Basic of neural network

If you have some programming experience, this article will not be difficult.(even if you have never learned Python)

Goal

Creating a modulo estimating network * Input : integer number * Output : modulo number by 10 of input (from 0 to 9)

Compared to MNIST, advantages in this example article are ( I think)

You need to create input and output data by yourself.
Easy to check whether the estimated answer is correct.
Simple problem. It will probably take less time than MNIST.

Let's begin!

0) Network Design

First of all, we need to consider "Network Structure", especially input and output.

Usually, input and output of neural networks are used in range of '0 to 1'.

Input Layer

This mean we need to convert 'input number'. In this article, I used binary number. To make a network simple, I limited range of input. From 0 to 255, so we need to create 8 input nodes.

Output layer

This is a simple classification problem., so the design of output layer is simple. From 0 to 9. Each node calculates the possibility of the answer.

Hidden layer

This example is easy to estimate. So I'll create 32 x 2 nodes for hidden layers. (I supposed it is enough for this example.)

Now, we decide the structure. Let's start coding!

1) Preparing

In this article, I use 'Google Colab'. It's easy to use for beginner. You can use many major python frameworks without setting up your computer.

Go to colab page and sign in with google account.

And create new file with Python3.

1) Import

type this in colab

import numpy as np
from keras.models import Sequential, model_from_json
from keras.layers.core import Activation, Dense
import random

Then, type 'SHFT + ENTER' to run the program. If you type all letters correctly, the output would be Using TensorFlow backend..

2) Create input / output data

Next, we need to create 'numpy array' for input data.
But creating numpy array directory is a little bit difficult (for me).
So I created simple int-array first and convert it to numpy array.

numpy is a very major, fast vector calculating framework.

In this article, I didn't use keras utility library as much as possible. I thought it was good for learning.

train_count = 500 # number of train data
test_count = 50 # number of test data
base_x = []
base_y = []

# create
for _ in range(train_count + test_count):
  n = random.randint(0, 255)
  answer = n % 10

  #convert to binary number
  x = []
  y = []
  for i in range(8):
    n = n // 2
    x.append(n % 2)
  for i in range(10): #you can use "keras.utils.to_categorize" instead.
    if i == answer:
      y.append(1)
    else:
      y.append(0)
  base_x.append(x)
  base_y.append(y) 

# to numpy
base_x = np.array(base_x)
base_y = np.array(base_y)

# split data
train_x = base_x[:train_count]
train_y = base_y[:train_count]
test_x = base_x[train_count:]
test_y = base_y[train_count:]

Now, check the data is correct with debug print.

# Data is randomly genarated.The output log is different in each execution!
print(train_x[0])
print(train_y[0])

# output
[1, 0, 1, 0, 1, 0, 0, 0] # mean 21, the input number.
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0] # mean 1 is the answer. OK!

3) Create network

Keras is very simple to use.

1 create sequential

model = Sequencial()

We call the network Model, so the variable name is model. (I don't know why...)

2 Add Layer

We'll create 3 layers, for input, hidden and output.
Dense is simple 1-dimensional layer in keras.

Basic format

Dence(OUTPUT_COUNT, activation=ACTIVATION, input_dim=INPUT_COUNT)

model.add(Dense(8, activation='sigmoid',
                input_dim=8))
model.add(Dense(32, activation='sigmoid'))
model.add(Dense(32, activation='sigmoid'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer="rmsprop",
              metrics=["accuracy"])

In keras, the framework will connect each node in these layers. It's very useful. Just adding Dence!

But in the first layer, we need to define input_dim.
The framework, Keras, will create connection between layers automatically.
But the first layer doesn't have 'input' information(previous layer), so we need to designate number of inputs.

I use sigmoid for activation function, the most basic activation function. If you want to use other function, please check official document.

The last layer's activation is softmax. It is often used in the last layer for classification problem.

It convert the output value to 0 to 1, and sum of outputs will be eqaul to 1. So this activation function is convenient for classification.

In classification, loss is usually categorical_crossentropy.

optimizer is how change the network values. I don't know mutch about it. Many optimizers exist. Please check offical document and try!

metrics is the way to evaluate the network. I used accuracy, the most basic metrics.

4) Train!

Training the model is very easy! All you need is to call fit function!

hist = model.fit(train_x,
                 train_y,
                 epochs=10,
                 verbose=0)

epochs is how many times this training will run. Of course, the larger epoch is, the more time you need.

Even if you ignore time, large epoch value doesn't always come out a good result.
If you repeat too many times, the model would become a train data specialist. The model shows good results in train data, but shows bad data in other data.

We call this phenomenon over-fitting. This is very big problem in machine learnings.

verbose is a parameter which indicates how to show progress.

verbose=0 means "no output"

verbose=1 means "detail output with prgoress"

Epoch 1/2
500/500 [==============================] - 1s 2ms/step - loss: 2.3835 - acc: 0.0640
Epoch 2/2
250/500 [================--------------]

When you try very large data and model, verbose=1 is good way to test your model.

If you think verbose=1 shows too many data, verbose=2 is better for yo u.
verbose=2 means "output result in each epoch".

Epoch 1/2
1s - loss: 2.3531 - acc: 0.1260
Epoch 2/2
0s - loss: 2.3193 - acc: 0.1280

I often use verbose=1 for testing model with smaller epoch conut.
But use verbose=0 for training with larger epoch count.

hist means history of the results, containing loss and accuracy history.

5) show results!

Let's show how the model improved with matplotlib.
matplotlib can plot graph in simple code.

In this tutorial, I'll try to show accuracy graph.

import matplotlib.pyplot as plt
 
plt.plot(hist.history["acc"])
plt.legend(fontsize=10)
plt.grid()
plt.xlabel('epoch')
plt.ylabel('acc')
plt.show()

When you run this code, you will see a graph.
This graph is very useful to decide whether to continue training or not.

6) Predict

I did 500 epoches, and the accuracy became 0.92.
Then let's go predict phase.

Predicting with trained model is also easy, calling predict.

results = model.predict(test_x)

Now, check the results is correct or not.

import math

for i, r in enumerate(results):
  input = 0
  for j in range(8):
    if test_x[i][j]:
      input += math.pow(2, j)
  answer = np.argmax(test_y[i])
  predicted = np.argmax(r)
  print("input:{} predicted:{} answer:{}".format(int(input), predicted, answer))

At this time, the results were

input:140 predicted:0 answer:0
input:114 predicted:4 answer:4
input:136 predicted:6 answer:6
input:149 predicted:9 answer:9
input:174 predicted:4 answer:4
input:75 predicted:5 answer:5
input:47 predicted:7 answer:7
input:137 predicted:7 answer:7
input:90 predicted:0 answer:0
input:6 predicted:6 answer:6

Luckily, all predicted results were correct.

When you can't get a good predicts, consider followings

increase train data
increase epoch count
change network structure

But improving the model is very difficult work, requiring much more experiment on deep learning and knowledge.

This is the end of this tutorial.

I would be glad if you are instrested in Deep Learning and start coding.

最後に

ディープラーニングの勉強ついでに英語でまとめてみました。
昨日今日始めたばかり初心者ですが、本当にやってて楽しいです。

例の Coursera機械学習コースでの挫折が英語のきっかけ ですしね(笑)
とはいえ英語してる間に、AIブームは過ぎちゃったかな？という気はしています。

この調子だと例の講座には進まないかもしれませんが、英語自体は今後ずっと役に立つでしょうから気長に勉強していきます。

しかし英語、TOEIC 700点でも意外と書ける もんですね。書いてて自分でびっくり。
どこまで正しい英語なのかはよくわかりません(笑)
Google先生に翻訳させる分にはまぁまぁな結果でしたが正しいかどうかチェックする術はない...

が、やはり語彙力がないので単調な表現の繰り返し になっちゃいますね。書きながら感じていました。
同じような日本語記事をみたら、おそらく記事を読みません(笑)。ここは永遠の課題でしょうね。

逆にいうと、その辺を無視すればそこそこコミュニケーションできそう。
通じるには通じる、といいますか。

TOEIC公式でのBランク(730点)にある「ネイティブの補助があれば仕事ができる」というのは結構正しい指標なのかもしれません。

いい英語の練習にもなるので、今後も時間があったら機械学習に限らず色々英語で書いてみたいと思います。たぶん専門のネイティブアプリ寄りです(笑)