You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+14Lines changed: 14 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,20 @@
2
2
3
3
## What's New
4
4
5
+
### Feb 18, 2021
6
+
* Add pretrained weights and model variants for NFNet-F* models from [DeepMind Haiku impl](https://github.com/deepmind/deepmind-research/tree/master/nfnets).
7
+
* Models are prefixed with `dm_`. They require SAME padding conv, skipinit enabled, and activation gains applied in act fn.
8
+
* These models are big, expect to run out of GPU memory. With the GELU activiation + other options, they are roughly 1/2 the inference speed of my SiLU PyTorch optimized `s` variants.
9
+
* Original model results are based on pre-processing that is not the same as all other models so you'll see different results in the results csv (once updated).
10
+
* Matching the original pre-processing as closely as possible I get these results:
11
+
*`dm_nfnet_f6` - 86.352
12
+
*`dm_nfnet_f5` - 86.100
13
+
*`dm_nfnet_f4` - 85.834
14
+
*`dm_nfnet_f3` - 85.676
15
+
*`dm_nfnet_f2` - 85.178
16
+
*`dm_nfnet_f1` - 84.696
17
+
*`dm_nfnet_f0` - 83.464
18
+
5
19
### Feb 16, 2021
6
20
* Add Adaptive Gradient Clipping (AGC) as per https://arxiv.org/abs/2102.06171. Integrated w/ PyTorch gradient clipping via mode arg that defaults to prev 'norm' mode. For backward arg compat, clip-grad arg must be specified to enable when using train.py.
0 commit comments