Something wrong with the code in chapter 10

I've been reading the book and strictly following the code examples. But I think there's something wrong with the code in chapter 10, when training a model using CNN to recognize the MNIST images. In the last part of the code when updating the weights:
```python
layer_2_delta = (labels[batch_start:batch_end]-layer_2) / (batch_size*layer_2.shape[0])
layer_1_delta = layer_2_delta.dot(weights_1_2.T)*tanh2deriv(layer_1)
layer_1_delta*=dropout_mask
weights_1_2 += alpha*layer_1.T.dot(layer_2_delta)
l1d_reshape = layer_1_delta.reshape(kernel_output.shape)
k_update = flattened_input.T.dot(l1d_reshape)
kernels -= alpha*k_update
```
I'm gently surprised because according to what I have previously learned in the book, the layer_x_deltas should be calculating the **negetive derivatives** of the loss functions, so with the last line, I think it should be
```python
kernels += alpha*k_update
```
After modifying this, I try it on my own computer. The output:
```python
I:0 Train-Acc: 0.132
I:1 Train-Acc: 0.174
I:2 Train-Acc: 0.191
I:3 Train-Acc: 0.215
I:4 Train-Acc: 0.241
I:5 Train-Acc: 0.249
I:6 Train-Acc: 0.296
I:7 Train-Acc: 0.31
I:8 Train-Acc: 0.37
I:9 Train-Acc: 0.358
I:10 Train-Acc: 0.408
I:11 Train-Acc: 0.438
I:12 Train-Acc: 0.465
I:13 Train-Acc: 0.479
I:14 Train-Acc: 0.528
I:15 Train-Acc: 0.548
I:16 Train-Acc: 0.533
I:17 Train-Acc: 0.569
I:18 Train-Acc: 0.574
I:19 Train-Acc: 0.605
I:20 Train-Acc: 0.605
...
```
But with the original code, I get:
```python
I:0 Train-Acc: 0.055
I:1 Train-Acc: 0.037
I:2 Train-Acc: 0.037
I:3 Train-Acc: 0.04
I:4 Train-Acc: 0.046
I:5 Train-Acc: 0.068
I:6 Train-Acc: 0.083
I:7 Train-Acc: 0.096
I:8 Train-Acc: 0.127
I:9 Train-Acc: 0.148
I:10 Train-Acc: 0.181
I:11 Train-Acc: 0.209
I:12 Train-Acc: 0.238
I:13 Train-Acc: 0.286
I:14 Train-Acc: 0.274
I:15 Train-Acc: 0.257
I:16 Train-Acc: 0.243
I:17 Train-Acc: 0.112
I:18 Train-Acc: 0.035
I:19 Train-Acc: 0.026
I:20 Train-Acc: 0.022
```
After modifying, the accuracy of the training set increases much rapidly than with the original "-=". However, it puzzles me that after 300 times of iteration, both models get an accuracy about 86%. So what's the difference? Does the code have a typo or I just simply have misunderstood it?
I posted a question about this on [stackoverflow](https://stackoverflow.com/questions/77611685/problem-building-cnn-only-using-python-numpy-when-gradient-descent-and-batching). I have not typed the code wrongly. So what's wrong?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Something wrong with the code in chapter 10 #65

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Something wrong with the code in chapter 10 #65

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions