Update README.md, documentation, add TOC, sponsors, and links to timmdocs and paperswithcode

rwightman · rwightman · commit 5c7d2982346b · 2021-03-10T10:24:23.000-08:00
diff --git a/README.md b/README.md
@@ -1,4 +1,25 @@
-# PyTorch Image Models, etc
+# PyTorch Image Models
+- [Sponsors](#sponsors)
+- [What's New](#whats-new)
+- [Introduction](#introduction)
+- [Models](#models)
+- [Features](#features)
+- [Results](#results)
+- [Getting Started (Documentation)](#getting-started-documentation)
+- [Train, Validation, Inference Scripts](#train-validation-inference-scripts)
+- [Awesome PyTorch Resources](#awesome-pytorch-resources)
+- [Licenses](#licenses)
+- [Citing](#citing)
+
+## Sponsors
+
+A big thank you to my [GitHub Sponsors](https://github.com/sponsors/rwightman) for their support!
+
+In addition to the sponsors at the link above, I've received hardware and/or cloud resources from
+* Nvidia (https://www.nvidia.com/en-us/)
+* TFRC (https://www.tensorflow.org/tfrc)
+
+I'm fortunate to be able to dedicate significant time and money of my own supporting this and other open source projects. However, as the projects increase in scope, outside support is needed to continue with the current trajectory of hardware, infrastructure, and electricty costs.
 
 ## What's New
 
@@ -120,45 +141,6 @@
 * Support for native Torch AMP and channels_last memory format added to train/validate scripts (`--channels-last`, `--native-amp` vs `--apex-amp`)
 * Models tested with channels_last on latest NGC 20.08 container. AdaptiveAvgPool in attn layers changed to mean((2,3)) to work around bug with NHWC kernel.
 
-### Aug 12, 2020
-* New/updated weights from training experiments
-  * EfficientNet-B3 - 82.1 top-1 (vs 81.6 for official with AA and 81.9 for AdvProp)
-  * RegNetY-3.2GF - 82.0 top-1 (78.9 from official ver)
-  * CSPResNet50 - 79.6 top-1 (76.6 from official ver)
-* Add CutMix integrated w/ Mixup. See [pull request](https://github.com/rwightman/pytorch-image-models/pull/218) for some usage examples
-* Some fixes for using pretrained weights with `in_chans` != 3 on several models.
-
-### Aug 5, 2020
-Universal feature extraction, new models, new weights, new test sets.
-* All models support the `features_only=True` argument for `create_model` call to return a network that extracts feature maps from the deepest layer at each stride.
-* New models
-  * CSPResNet, CSPResNeXt, CSPDarkNet, DarkNet
-  * ReXNet
-  * (Modified Aligned) Xception41/65/71 (a proper port of TF models)
-* New trained weights
-  * SEResNet50 - 80.3 top-1
-  * CSPDarkNet53 - 80.1 top-1
-  * CSPResNeXt50 - 80.0 top-1
-  * DPN68b - 79.2 top-1
-  * EfficientNet-Lite0 (non-TF ver) - 75.5 (submitted by [@hal-314](https://github.com/hal-314))
-* Add 'real' labels for ImageNet and ImageNet-Renditions test set, see [`results/README.md`](results/README.md)
-* Test set ranking/top-n diff script by [@KushajveerSingh](https://github.com/KushajveerSingh)
-* Train script and loader/transform tweaks to punch through more aug arguments
-* README and documentation overhaul. See initial (WIP) documentation at https://rwightman.github.io/pytorch-image-models/
-* adamp and sgdp optimizers added by [@hellbell](https://github.com/hellbell)
-
-### June 11, 2020
-Bunch of changes:
-* DenseNet models updated with memory efficient addition from torchvision (fixed a bug), blur pooling and deep stem additions
-* VoVNet V1 and V2 models added, 39 V2 variant (ese_vovnet_39b) trained to 79.3 top-1
-* Activation factory added along with new activations:
-   * select act at model creation time for more flexibility in using activations compatible with scripting or tracing (ONNX export)
-   * hard_mish (experimental) added with memory-efficient grad, along with ME hard_swish
-   * context mgr for setting exportable/scriptable/no_jit states
-* Norm + Activation combo layers added with initial trial support in DenseNet and VoVNet along with impl of EvoNorm and InplaceAbn wrapper that fit the interface
-* Torchscript works for all but two of the model types as long as using Pytorch 1.5+, tests added for this
-* Some import cleanup and classifier reset changes, all models will have classifier reset to nn.Identity on reset_classifer(0) call
-* Prep for 0.1.28 pip release
 
 ## Introduction
 
@@ -271,9 +253,13 @@ Several (less common) features that I often utilize in my projects are included.
 
 Model validation results can be found in the [documentation](https://rwightman.github.io/pytorch-image-models/results/) and in the [results tables](results/README.md)
 
-## Getting Started
+## Getting Started (Documentation)
+
+My current [documentation](https://rwightman.github.io/pytorch-image-models/) for `timm` covers the basics.
+
+[timmdocs](https://fastai.github.io/timmdocs/) is quickly becoming a much more comprehensive set of documentation for `timm`. A big thanks to [Aman Arora](https://github.com/amaarora) for his efforts creating timmdocs.
 
-See the [documentation](https://rwightman.github.io/pytorch-image-models/)
+[paperswithcode](https://paperswithcode.com/lib/timm) is a good resource for browsing the models within `timm`.
 
 ## Train, Validation, Inference Scripts
 
diff --git a/docs/archived_changes.md b/docs/archived_changes.md
@@ -1,5 +1,45 @@
 # Archived Changes
 
+### Aug 12, 2020
+* New/updated weights from training experiments
+  * EfficientNet-B3 - 82.1 top-1 (vs 81.6 for official with AA and 81.9 for AdvProp)
+  * RegNetY-3.2GF - 82.0 top-1 (78.9 from official ver)
+  * CSPResNet50 - 79.6 top-1 (76.6 from official ver)
+* Add CutMix integrated w/ Mixup. See [pull request](https://github.com/rwightman/pytorch-image-models/pull/218) for some usage examples
+* Some fixes for using pretrained weights with `in_chans` != 3 on several models.
+
+### Aug 5, 2020
+Universal feature extraction, new models, new weights, new test sets.
+* All models support the `features_only=True` argument for `create_model` call to return a network that extracts feature maps from the deepest layer at each stride.
+* New models
+  * CSPResNet, CSPResNeXt, CSPDarkNet, DarkNet
+  * ReXNet
+  * (Modified Aligned) Xception41/65/71 (a proper port of TF models)
+* New trained weights
+  * SEResNet50 - 80.3 top-1
+  * CSPDarkNet53 - 80.1 top-1
+  * CSPResNeXt50 - 80.0 top-1
+  * DPN68b - 79.2 top-1
+  * EfficientNet-Lite0 (non-TF ver) - 75.5 (submitted by [@hal-314](https://github.com/hal-314))
+* Add 'real' labels for ImageNet and ImageNet-Renditions test set, see [`results/README.md`](results/README.md)
+* Test set ranking/top-n diff script by [@KushajveerSingh](https://github.com/KushajveerSingh)
+* Train script and loader/transform tweaks to punch through more aug arguments
+* README and documentation overhaul. See initial (WIP) documentation at https://rwightman.github.io/pytorch-image-models/
+* adamp and sgdp optimizers added by [@hellbell](https://github.com/hellbell)
+
+### June 11, 2020
+Bunch of changes:
+* DenseNet models updated with memory efficient addition from torchvision (fixed a bug), blur pooling and deep stem additions
+* VoVNet V1 and V2 models added, 39 V2 variant (ese_vovnet_39b) trained to 79.3 top-1
+* Activation factory added along with new activations:
+   * select act at model creation time for more flexibility in using activations compatible with scripting or tracing (ONNX export)
+   * hard_mish (experimental) added with memory-efficient grad, along with ME hard_swish
+   * context mgr for setting exportable/scriptable/no_jit states
+* Norm + Activation combo layers added with initial trial support in DenseNet and VoVNet along with impl of EvoNorm and InplaceAbn wrapper that fit the interface
+* Torchscript works for all but two of the model types as long as using Pytorch 1.5+, tests added for this
+* Some import cleanup and classifier reset changes, all models will have classifier reset to nn.Identity on reset_classifer(0) call
+* Prep for 0.1.28 pip release
+
 ### May 12, 2020
 * Add ResNeSt models (code adapted from https://github.com/zhanghang1989/ResNeSt, paper https://arxiv.org/abs/2004.08955))
 
diff --git a/docs/changes.md b/docs/changes.md
@@ -1,5 +1,33 @@
 # Recent Changes
 
+### March 7, 2021
+* First 0.4.x PyPi release w/ NFNets (& related), ByoB (GPU-Efficient, RepVGG, etc).
+* Change feature extraction for pre-activation nets (NFNets, ResNetV2) to return features before activation.
+
+### Feb 18, 2021
+* Add pretrained weights and model variants for NFNet-F* models from [DeepMind Haiku impl](https://github.com/deepmind/deepmind-research/tree/master/nfnets).
+  * Models are prefixed with `dm_`. They require SAME padding conv, skipinit enabled, and activation gains applied in act fn.
+  * These models are big, expect to run out of GPU memory. With the GELU activiation + other options, they are roughly 1/2 the inference speed of my SiLU PyTorch optimized `s` variants.
+  * Original model results are based on pre-processing that is not the same as all other models so you'll see different results in the results csv (once updated).
+  * Matching the original pre-processing as closely as possible I get these results:
+    * `dm_nfnet_f6` - 86.352
+    * `dm_nfnet_f5` - 86.100
+    * `dm_nfnet_f4` - 85.834
+    * `dm_nfnet_f3` - 85.676
+    * `dm_nfnet_f2` - 85.178
+    * `dm_nfnet_f1` - 84.696
+    * `dm_nfnet_f0` - 83.464
+
+### Feb 16, 2021
+* Add Adaptive Gradient Clipping (AGC) as per https://arxiv.org/abs/2102.06171. Integrated w/ PyTorch gradient clipping via mode arg that defaults to prev 'norm' mode. For backward arg compat, clip-grad arg must be specified to enable when using train.py.
+  * AGC w/ default clipping factor `--clip-grad .01 --clip-mode agc`
+  * PyTorch global norm of 1.0 (old behaviour, always norm), `--clip-grad 1.0`
+  * PyTorch value clipping of 10, `--clip-grad 10. --clip-mode value`
+  * AGC performance is definitely sensitive to the clipping factor. More experimentation needed to determine good values for smaller batch sizes and optimizers besides those in paper. So far I've found .001-.005 is necessary for stable RMSProp training w/ NFNet/NF-ResNet.
+
+### Feb 12, 2021
+* Update Normalization-Free nets to include new NFNet-F (https://arxiv.org/abs/2102.06171) model defs
+
 ### Feb 10, 2021
 * More model archs, incl a flexible ByobNet backbone ('Bring-your-own-blocks')
   * GPU-Efficient-Networks (https://github.com/idstcv/GPU-Efficient-Networks), impl in `byobnet.py`
diff --git a/docs/index.md b/docs/index.md
@@ -1,5 +1,11 @@
 # Getting Started
 
+## Welcome
+
+Welcome to the `timm` documentation, a lean set of docs that covers the basics of `timm`.
+
+For a more comprehensive set of docs (currently under development), please visit [timmdocs](https://fastai.github.io/timmdocs/) by [Aman Arora](https://github.com/amaarora).
+
 ## Install
 
 The library can be installed with pip:
@@ -8,18 +14,23 @@ The library can be installed with pip:
 pip install timm
 ```
 
+I update the PyPi (pip) packages when I'm confident there are no significant model regressions from previous releases. If you want to pip install the bleeding edge from GitHub, use:
+```
+pip install git+https://github.com/rwightman/pytorch-image-models.git
+```
+
 !!! info "Conda Environment"
-    All development and testing has been done in Conda Python 3 environments on Linux x86-64 systems, specifically Python 3.6.x, 3.7.x., 3.8.x.
+    All development and testing has been done in Conda Python 3 environments on Linux x86-64 systems, specifically Python 3.6.x, 3.7.x., 3.8.x., 3.9
     
     Little to no care has been taken to be Python 2.x friendly and will not support it. If you run into any challenges running on Windows, or other OS, I'm definitely open to looking into those issues so long as it's in a reproducible (read Conda) environment.
     
-    PyTorch versions 1.4, 1.5.x, 1.6, and 1.7 have been tested with this code.
+    PyTorch versions 1.4, 1.5.x, 1.6, 1.7.x, and 1.8 have been tested with this code.
     
     I've tried to keep the dependencies minimal, the setup is as per the PyTorch default install instructions for Conda:
     ```
     conda create -n torch-env
     conda activate torch-env
-    conda install -c pytorch pytorch torchvision cudatoolkit=11
+    conda install pytorch torchvision cudatoolkit=11.1 -c pytorch -c conda-forge
     conda install pyyaml
     ```
 
diff --git a/docs/models.md b/docs/models.md
@@ -8,7 +8,9 @@ Most included models have pretrained weights. The weights are either:
 2. ported by myself from their original impl in a different framework (e.g. Tensorflow models)
 3. trained from scratch using the included training script
 
-The validation results for the pretrained weights can be found [here](results.md)
+The validation results for the pretrained weights are [here](results.md)
+
+A more exciting view (with pretty pictures) of the models within `timm` can be found at [paperswithcode](https://paperswithcode.com/lib/timm).
 
 ## Big Transfer ResNetV2 (BiT) [[resnetv2.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/resnetv2.py)]
 * Paper: `Big Transfer (BiT): General Visual Representation Learning` - https://arxiv.org/abs/1912.11370
diff --git a/docs/results.md b/docs/results.md
@@ -1,9 +1,10 @@
 # Results
 
-CSV files containing an ImageNet-1K validation and out-of-distribution (OOD) test set validation results for all included models with pretrained weights and default configurations is located [here](https://github.com/rwightman/pytorch-image-models/tree/master/results).
+CSV files containing an ImageNet-1K and out-of-distribution (OOD) test set validation results for all models with pretrained weights is located in the repository [results folder](https://github.com/rwightman/pytorch-image-models/tree/master/results).
 
 ## Self-trained Weights
-I've leveraged the training scripts in this repository to train a few of the models with to good levels of performance.
+
+The table below includes ImageNet-1k validation results of model weights that I've trained myself. It is not updated as frequently as the csv results outputs linked above.
 
 |Model | Acc@1 (Err) | Acc@5 (Err) | Param # (M) | Interpolation | Image Size |
 |---|---|---|---|---|---|