Parameter transmission (value transmission, reference transfer, pointer transmission) LC

2023-01-02   ES  

How to use Keras to train a model with an accuracy rate of 89%on CIFAR10

CIFAR10 is a classic data set for image recognition, including 10 types of pictures. This data set has 60,000 color pictures with a size of 32 X 32, of which 50,000 are used for training and 10,000 are used for testing.[CIFAR10]

In several classic image recognition datasets (MNIST / CIFAR10 / CIFAR100 / STL-10 / SVHN / ImageNet), for the CIFAR10 dataset, the current model-OF-Art-level model in the industry can reach

The maximum accuracy rate ofis 96.53%Detailed ranking and thesis link)。

Now let’s use Keras to train a CNN model from scratch, the goal is to allow the model to reach nearly 89% of the accuracy on CIFAR10.

1. Data import and pre -processing

  • Using Keras.DataSets can easily import data from CIFAR10.
  • regularization: The range of the pixel point from [0, 255] to [0, 1]. In fact, there are many formal methods for different classic models, such as [0, 1], [-1, 1], Mean Subtraction, and so on.
  • Use keys.utils.to_categorical to make one-hot code on the ten types of tags for later SoftMax classification.
nb_classes = 10

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
y_train = y_train.reshape(y_train.shape[0])
y_test = y_test.reshape(y_test.shape[0])
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
y_train = to_categorical(y_train, nb_classes)
y_test = to_categorical(y_test, nb_classes)

2. Create a model

Here we use a structural creation model similar to VGG16.

  • Use a fixed small convolution core 3 x 3
  • Two -layer convolution matching one layer of pool
  • The first three convolutional pooling structures of VGG16: The number of convolution cores with the power of 2 (64, 128, 256)

    The full connection layer of the

  • model does not use VGG16’s huge three -layer structure (the number of full connection layer parameters of VGG16 accounts for more than 90%of the entire model)
  • volume layer output directly on the 10 classification SoftMax classifier (tested the full connection layer node of the 128 nodes that added a layer of 128 nodes before the output layer, but the 100th generation of training can only reach the accuracy of 79%, which is even more than not. Difference)
  • Weight initialization all use He Normal
x = Input(shape=(32, 32, 3))
y = x
y = Convolution2D(filters=64, kernel_size=3, strides=1, padding='same', activation='relu', kernel_initializer='he_normal')(y)
y = Convolution2D(filters=64, kernel_size=3, strides=1, padding='same', activation='relu', kernel_initializer='he_normal')(y)
y = MaxPooling2D(pool_size=2, strides=2, padding='same')(y)

y = Convolution2D(filters=128, kernel_size=3, strides=1, padding='same', activation='relu', kernel_initializer='he_normal')(y)
y = Convolution2D(filters=128, kernel_size=3, strides=1, padding='same', activation='relu', kernel_initializer='he_normal')(y)
y = MaxPooling2D(pool_size=2, strides=2, padding='same')(y)

y = Convolution2D(filters=256, kernel_size=3, strides=1, padding='same', activation='relu', kernel_initializer='he_normal')(y)
y = Convolution2D(filters=256, kernel_size=3, strides=1, padding='same', activation='relu', kernel_initializer='he_normal')(y)
y = MaxPooling2D(pool_size=2, strides=2, padding='same')(y)

y = Flatten()(y)
y = Dropout(0.5)(y)
y = Dense(units=nb_classes, activation='softmax', kernel_initializer='he_normal')(y)

model1 = Model(inputs=x, outputs=y, name='model1')

model1.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])

Use Summary () to check the complete structure of the model.

can be noticed at the same time,Model The total number of participants is about 1.18 million(This is a simplified model)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_4 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_19 (Conv2D)           (None, 32, 32, 64)        1792      
_________________________________________________________________
conv2d_20 (Conv2D)           (None, 32, 32, 64)        36928     
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 16, 16, 64)        0         
_________________________________________________________________
conv2d_21 (Conv2D)           (None, 16, 16, 128)       73856     
_________________________________________________________________
conv2d_22 (Conv2D)           (None, 16, 16, 128)       147584    
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 8, 8, 128)         0         
_________________________________________________________________
conv2d_23 (Conv2D)           (None, 8, 8, 256)         295168    
_________________________________________________________________
conv2d_24 (Conv2D)           (None, 8, 8, 256)         590080    
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 4, 4, 256)         0         
_________________________________________________________________
flatten_4 (Flatten)          (None, 4096)              0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 4096)              0         
_________________________________________________________________
dense_6 (Dense)              (None, 10)                40970     
=================================================================
Total params: 1,186,378
Trainable params: 1,186,378
Non-trainable params: 0
_________________________________________________________________

3. Definition training method

Keras is more interesting that you can use various callback functions to make your training process easier and more fast.

  • Early Stopping: Timely stop training through the trend of monitoring performance indicators to avoid overfitting.
  • Model Checkpoint: Automatically save the model snapshot of each generation of training, which can only record the performance of performance improvement.
  • Tensor Board: Automatically configure a powerful visualization tool Tensorboard to save each generation of training data and draw it on Tensorboard to facilitate the performance of the comparison model in the later stage.

4. Several directions for improving model performance

In order to improve the performance of the model, I currently understand that they can usually work in the following directions:

  • Data Augmentation
  • Weight Initialization
  • Transfer Learning + Fine-tune
  • Ensemble / Model Fusion

The first one is the first: image enhancement. Keras is very convenient to bring the generator with image enhancement. This generator function provides many image transformation parameters, such as rotation, translation, albinization, and so on. You can directly define the image enhanced generator object, and then use the Flow function to generate the generator that can be used for training. The complete training functions I define are as follows.

import os
from datetime import datetime

def train(model, batch, epoch, data_augmentation=True):
    start = time()
    log_dir = datetime.now().strftime('model_%Y%m%d_%H%M')
    os.mkdir(log_dir)
    
    es = EarlyStopping(monitor='val_acc', patience=20)
    mc = ModelCheckpoint(log_dir + '\\CIFAR10-EP{epoch:02d}-ACC{val_acc:.4f}.h5', 
                         monitor='val_acc', save_best_only=True)
    tb = TensorBoard(log_dir=log_dir, histogram_freq=0)
    
    if data_augmentation:
        aug = ImageDataGenerator(width_shift_range = 0.125, height_shift_range = 0.125, horizontal_flip = True)
        aug.fit(X_train)
        gen = aug.flow(X_train, y_train, batch_size=batch)
        h = model.fit_generator(generator=gen, 
                                 steps_per_epoch=50000/batch, 
                                 epochs=epoch, 
                                 validation_data=(X_test, y_test),
                                 callbacks=[es, mc, tb])
    else:
        start = time()
        h = model.fit(x=X_train, 
                      y=y_train, 
                      batch_size=batch, 
                      epochs=epoch, 
                      validation_data=(X_test, y_test),
                      callbacks=[es, mc, tb])
    
    print('\n@ Total Time Spent: %.2f seconds' % (time() - start))
    acc, val_acc = h.history['acc'], h.history['val_acc']
    m_acc, m_val_acc = np.argmax(acc), np.argmax(val_acc)
    print("@ Best Training Accuracy: %.2f %% achieved at EP #%d." % (acc[m_acc] * 100, m_acc + 1))
    print("@ Best Testing Accuracy: %.2f %% achieved at EP #%d." % (val_acc[m_val_acc] * 100, m_val_acc + 1))
    return h

5. Training model

We first use Batch Size = 64 to train the model.

account_curve is an account and Loss curve drawn by the historical information returned by training. See the code:utils.py. If you use Tensorboard, see the same chart information.

epoch = 200
batch = 64

h = train(model1, batch, epoch)
accuracy_curve(h)

-----------------------
........
Epoch 105/200
16s - loss: 0.0656 - acc: 0.9782 - val_loss: 0.5996 - val_acc: 0.8947
Epoch 106/200
16s - loss: 0.0661 - acc: 0.9782 - val_loss: 0.6633 - val_acc: 0.8938

@ Total Time Spent: 1875.35 seconds
@ Best Training Accuracy: 97.89 % achieved at EP #98.
@ Best Testing Accuracy: 90.00 % achieved at EP #85.


6. Analysis and training results

Training results show that the model has reached 90%of the test accuracy in the 85th generation.

However, we can observe that the testing Loss curve of the 40th generation model no longer decreases, but it has begun to rise. This shows that the model has entered an overcoming state and should not continue training. This is because the performance indicators I detected in the Early Stoppping detection are not Val_loss but Val_acc, so it is not possible to end training in time when the LOSS of the model is stopped.

So the test accuracy of the normal convergence of the model should be between 20-40, that is, the accuracy rate is about 88% -89%.


7. Use Tensorboard to compare the model training process under different BATCH SIZE

I tried the above models of four different BATCH SIZE = 64 /128 /256 /512, and drawing the training accuracy and LOSS curve of these four models through Tensorboard, which was interesting.

  • horizontal axis is training for training
  • transparent folding line is the original data
  • Line is a smooth data, which can be clearer to see the change trend of each model on the same training indicator

  • amplify the detail map





Through this simple comparison test, you can get the followingThe conclusion of Batch Size on the training model

  • BATCH SIZE, the larger the training time for each generation of training, but it will no longer decrease to a certain extent
  • BATCH SIZE, the slower the model convergence until it can’t converge
  • BATCH SIZE, the greater the later meeting, the later it is reflected in

code link

Reprinted from: https://zhuanlan.zhihu.com/p/29214791

source

Related Posts

How to completely uninstall Anaconda? LORD

Python list commonly used operation The

Before design mode-Seven major software architecture design principles

Computer network (7th edition) -Chapter 5 Transportation Layer -Important Concept

Parameter transmission (value transmission, reference transfer, pointer transmission) LC

Random Posts

ANT component tool | Configuration information | batch processing command | cloning

linux simply read and write a file

Thymeleaf calls onClick passing the js function (multi -parameter) writing

html -mobile terminal

Android Studio execute error com.android.ddmlib.adbcommandrejectException: Device Unauthorized.leo The following problems occur during the execution of the real machine.