Skip to content

Something wrong with the code in chapter 10 #65

@Jason-XII

Description

@Jason-XII

I've been reading the book and strictly following the code examples. But I think there's something wrong with the code in chapter 10, when training a model using CNN to recognize the MNIST images. In the last part of the code when updating the weights:

layer_2_delta = (labels[batch_start:batch_end]-layer_2) / (batch_size*layer_2.shape[0])
layer_1_delta = layer_2_delta.dot(weights_1_2.T)*tanh2deriv(layer_1)
layer_1_delta*=dropout_mask
weights_1_2 += alpha*layer_1.T.dot(layer_2_delta)
l1d_reshape = layer_1_delta.reshape(kernel_output.shape)
k_update = flattened_input.T.dot(l1d_reshape)
kernels -= alpha*k_update

I'm gently surprised because according to what I have previously learned in the book, the layer_x_deltas should be calculating the negetive derivatives of the loss functions, so with the last line, I think it should be

kernels += alpha*k_update

After modifying this, I try it on my own computer. The output:

I:0 Train-Acc: 0.132
I:1 Train-Acc: 0.174
I:2 Train-Acc: 0.191
I:3 Train-Acc: 0.215
I:4 Train-Acc: 0.241
I:5 Train-Acc: 0.249
I:6 Train-Acc: 0.296
I:7 Train-Acc: 0.31
I:8 Train-Acc: 0.37
I:9 Train-Acc: 0.358
I:10 Train-Acc: 0.408
I:11 Train-Acc: 0.438
I:12 Train-Acc: 0.465
I:13 Train-Acc: 0.479
I:14 Train-Acc: 0.528
I:15 Train-Acc: 0.548
I:16 Train-Acc: 0.533
I:17 Train-Acc: 0.569
I:18 Train-Acc: 0.574
I:19 Train-Acc: 0.605
I:20 Train-Acc: 0.605
...

But with the original code, I get:

I:0 Train-Acc: 0.055
I:1 Train-Acc: 0.037
I:2 Train-Acc: 0.037
I:3 Train-Acc: 0.04
I:4 Train-Acc: 0.046
I:5 Train-Acc: 0.068
I:6 Train-Acc: 0.083
I:7 Train-Acc: 0.096
I:8 Train-Acc: 0.127
I:9 Train-Acc: 0.148
I:10 Train-Acc: 0.181
I:11 Train-Acc: 0.209
I:12 Train-Acc: 0.238
I:13 Train-Acc: 0.286
I:14 Train-Acc: 0.274
I:15 Train-Acc: 0.257
I:16 Train-Acc: 0.243
I:17 Train-Acc: 0.112
I:18 Train-Acc: 0.035
I:19 Train-Acc: 0.026
I:20 Train-Acc: 0.022

After modifying, the accuracy of the training set increases much rapidly than with the original "-=". However, it puzzles me that after 300 times of iteration, both models get an accuracy about 86%. So what's the difference? Does the code have a typo or I just simply have misunderstood it?
I posted a question about this on stackoverflow. I have not typed the code wrongly. So what's wrong?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions