Skip to content

Commit 531215e

Browse files
committed
Bring changelog up to current, back to dev version, 1.0.9.dev0
1 parent d29b720 commit 531215e

File tree

2 files changed

+223
-2
lines changed

2 files changed

+223
-2
lines changed

hfdocs/source/changes.mdx

Lines changed: 222 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,225 @@
1-
# Recent Changes
1+
# Changelog
2+
3+
### Aug 8, 2024
4+
* Add RDNet ('DenseNets Reloaded', https://arxiv.org/abs/2403.19588), thanks [Donghyun Kim](https://github.com/dhkim0225)
5+
6+
### July 28, 2024
7+
* Add `mobilenet_edgetpu_v2_m` weights w/ `ra4` mnv4-small based recipe. 80.1% top-1 @ 224 and 80.7 @ 256.
8+
* Release 1.0.8
9+
10+
### July 26, 2024
11+
* More MobileNet-v4 weights, ImageNet-12k pretrain w/ fine-tunes, and anti-aliased ConvLarge models
12+
13+
| model |top1 |top1_err|top5 |top5_err|param_count|img_size|
14+
|--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------|
15+
| [mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k](http://hf.co/timm/mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k)|84.99 |15.01 |97.294|2.706 |32.59 |544 |
16+
| [mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k](http://hf.co/timm/mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k)|84.772|15.228 |97.344|2.656 |32.59 |480 |
17+
| [mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k](http://hf.co/timm/mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k)|84.64 |15.36 |97.114|2.886 |32.59 |448 |
18+
| [mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k](http://hf.co/timm/mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k)|84.314|15.686 |97.102|2.898 |32.59 |384 |
19+
| [mobilenetv4_conv_aa_large.e600_r384_in1k](http://hf.co/timm/mobilenetv4_conv_aa_large.e600_r384_in1k) |83.824|16.176 |96.734|3.266 |32.59 |480 |
20+
| [mobilenetv4_conv_aa_large.e600_r384_in1k](http://hf.co/timm/mobilenetv4_conv_aa_large.e600_r384_in1k) |83.244|16.756 |96.392|3.608 |32.59 |384 |
21+
| [mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k)|82.99 |17.01 |96.67 |3.33 |11.07 |320 |
22+
| [mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k)|82.364|17.636 |96.256|3.744 |11.07 |256 |
23+
24+
* Impressive MobileNet-V1 and EfficientNet-B0 baseline challenges (https://huggingface.co/blog/rwightman/mobilenet-baselines)
25+
26+
| model |top1 |top1_err|top5 |top5_err|param_count|img_size|
27+
|--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------|
28+
| [efficientnet_b0.ra4_e3600_r224_in1k](http://hf.co/timm/efficientnet_b0.ra4_e3600_r224_in1k) |79.364|20.636 |94.754|5.246 |5.29 |256 |
29+
| [efficientnet_b0.ra4_e3600_r224_in1k](http://hf.co/timm/efficientnet_b0.ra4_e3600_r224_in1k) |78.584|21.416 |94.338|5.662 |5.29 |224 |
30+
| [mobilenetv1_100h.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv1_100h.ra4_e3600_r224_in1k) |76.596|23.404 |93.272|6.728 |5.28 |256 |
31+
| [mobilenetv1_100.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv1_100.ra4_e3600_r224_in1k) |76.094|23.906 |93.004|6.996 |4.23 |256 |
32+
| [mobilenetv1_100h.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv1_100h.ra4_e3600_r224_in1k) |75.662|24.338 |92.504|7.496 |5.28 |224 |
33+
| [mobilenetv1_100.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv1_100.ra4_e3600_r224_in1k) |75.382|24.618 |92.312|7.688 |4.23 |224 |
34+
35+
* Prototype of `set_input_size()` added to vit and swin v1/v2 models to allow changing image size, patch size, window size after model creation.
36+
* Improved support in swin for different size handling, in addition to `set_input_size`, `always_partition` and `strict_img_size` args have been added to `__init__` to allow more flexible input size constraints
37+
* Fix out of order indices info for intermediate 'Getter' feature wrapper, check out or range indices for same.
38+
* Add several `tiny` < .5M param models for testing that are actually trained on ImageNet-1k
39+
40+
|model |top1 |top1_err|top5 |top5_err|param_count|img_size|crop_pct|
41+
|----------------------------|------|--------|------|--------|-----------|--------|--------|
42+
|test_efficientnet.r160_in1k |47.156|52.844 |71.726|28.274 |0.36 |192 |1.0 |
43+
|test_byobnet.r160_in1k |46.698|53.302 |71.674|28.326 |0.46 |192 |1.0 |
44+
|test_efficientnet.r160_in1k |46.426|53.574 |70.928|29.072 |0.36 |160 |0.875 |
45+
|test_byobnet.r160_in1k |45.378|54.622 |70.572|29.428 |0.46 |160 |0.875 |
46+
|test_vit.r160_in1k|42.0 |58.0 |68.664|31.336 |0.37 |192 |1.0 |
47+
|test_vit.r160_in1k|40.822|59.178 |67.212|32.788 |0.37 |160 |0.875 |
48+
49+
* Fix vit reg token init, thanks [Promisery](https://github.com/Promisery)
50+
* Other misc fixes
51+
52+
### June 24, 2024
53+
* 3 more MobileNetV4 hyrid weights with different MQA weight init scheme
54+
55+
| model |top1 |top1_err|top5 |top5_err|param_count|img_size|
56+
|--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------|
57+
| [mobilenetv4_hybrid_large.ix_e600_r384_in1k](http://hf.co/timm/mobilenetv4_hybrid_large.ix_e600_r384_in1k) |84.356|15.644 |96.892 |3.108 |37.76 |448 |
58+
| [mobilenetv4_hybrid_large.ix_e600_r384_in1k](http://hf.co/timm/mobilenetv4_hybrid_large.ix_e600_r384_in1k) |83.990|16.010 |96.702 |3.298 |37.76 |384 |
59+
| [mobilenetv4_hybrid_medium.ix_e550_r384_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.ix_e550_r384_in1k) |83.394|16.606 |96.760|3.240 |11.07 |448 |
60+
| [mobilenetv4_hybrid_medium.ix_e550_r384_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.ix_e550_r384_in1k) |82.968|17.032 |96.474|3.526 |11.07 |384 |
61+
| [mobilenetv4_hybrid_medium.ix_e550_r256_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.ix_e550_r256_in1k) |82.492|17.508 |96.278|3.722 |11.07 |320 |
62+
| [mobilenetv4_hybrid_medium.ix_e550_r256_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.ix_e550_r256_in1k) |81.446|18.554 |95.704|4.296 |11.07 |256 |
63+
* florence2 weight loading in DaViT model
64+
65+
### June 12, 2024
66+
* MobileNetV4 models and initial set of `timm` trained weights added:
67+
68+
| model |top1 |top1_err|top5 |top5_err|param_count|img_size|
69+
|--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------|
70+
| [mobilenetv4_hybrid_large.e600_r384_in1k](http://hf.co/timm/mobilenetv4_hybrid_large.e600_r384_in1k) |84.266|15.734 |96.936 |3.064 |37.76 |448 |
71+
| [mobilenetv4_hybrid_large.e600_r384_in1k](http://hf.co/timm/mobilenetv4_hybrid_large.e600_r384_in1k) |83.800|16.200 |96.770 |3.230 |37.76 |384 |
72+
| [mobilenetv4_conv_large.e600_r384_in1k](http://hf.co/timm/mobilenetv4_conv_large.e600_r384_in1k) |83.392|16.608 |96.622 |3.378 |32.59 |448 |
73+
| [mobilenetv4_conv_large.e600_r384_in1k](http://hf.co/timm/mobilenetv4_conv_large.e600_r384_in1k) |82.952|17.048 |96.266 |3.734 |32.59 |384 |
74+
| [mobilenetv4_conv_large.e500_r256_in1k](http://hf.co/timm/mobilenetv4_conv_large.e500_r256_in1k) |82.674|17.326 |96.31 |3.69 |32.59 |320 |
75+
| [mobilenetv4_conv_large.e500_r256_in1k](http://hf.co/timm/mobilenetv4_conv_large.e500_r256_in1k) |81.862|18.138 |95.69 |4.31 |32.59 |256 |
76+
| [mobilenetv4_hybrid_medium.e500_r224_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.e500_r224_in1k) |81.276|18.724 |95.742|4.258 |11.07 |256 |
77+
| [mobilenetv4_conv_medium.e500_r256_in1k](http://hf.co/timm/mobilenetv4_conv_medium.e500_r256_in1k) |80.858|19.142 |95.768|4.232 |9.72 |320 |
78+
| [mobilenetv4_hybrid_medium.e500_r224_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.e500_r224_in1k) |80.442|19.558 |95.38 |4.62 |11.07 |224 |
79+
| [mobilenetv4_conv_blur_medium.e500_r224_in1k](http://hf.co/timm/mobilenetv4_conv_blur_medium.e500_r224_in1k) |80.142|19.858 |95.298|4.702 |9.72 |256 |
80+
| [mobilenetv4_conv_medium.e500_r256_in1k](http://hf.co/timm/mobilenetv4_conv_medium.e500_r256_in1k) |79.928|20.072 |95.184|4.816 |9.72 |256 |
81+
| [mobilenetv4_conv_medium.e500_r224_in1k](http://hf.co/timm/mobilenetv4_conv_medium.e500_r224_in1k) |79.808|20.192 |95.186|4.814 |9.72 |256 |
82+
| [mobilenetv4_conv_blur_medium.e500_r224_in1k](http://hf.co/timm/mobilenetv4_conv_blur_medium.e500_r224_in1k) |79.438|20.562 |94.932|5.068 |9.72 |224 |
83+
| [mobilenetv4_conv_medium.e500_r224_in1k](http://hf.co/timm/mobilenetv4_conv_medium.e500_r224_in1k) |79.094|20.906 |94.77 |5.23 |9.72 |224 |
84+
| [mobilenetv4_conv_small.e2400_r224_in1k](http://hf.co/timm/mobilenetv4_conv_small.e2400_r224_in1k) |74.616|25.384 |92.072|7.928 |3.77 |256 |
85+
| [mobilenetv4_conv_small.e1200_r224_in1k](http://hf.co/timm/mobilenetv4_conv_small.e1200_r224_in1k) |74.292|25.708 |92.116|7.884 |3.77 |256 |
86+
| [mobilenetv4_conv_small.e2400_r224_in1k](http://hf.co/timm/mobilenetv4_conv_small.e2400_r224_in1k) |73.756|26.244 |91.422|8.578 |3.77 |224 |
87+
| [mobilenetv4_conv_small.e1200_r224_in1k](http://hf.co/timm/mobilenetv4_conv_small.e1200_r224_in1k) |73.454|26.546 |91.34 |8.66 |3.77 |224 |
88+
89+
* Apple MobileCLIP (https://arxiv.org/pdf/2311.17049, FastViT and ViT-B) image tower model support & weights added (part of OpenCLIP support).
90+
* ViTamin (https://arxiv.org/abs/2404.02132) CLIP image tower model & weights added (part of OpenCLIP support).
91+
* OpenAI CLIP Modified ResNet image tower modelling & weight support (via ByobNet). Refactor AttentionPool2d.
92+
93+
### May 14, 2024
94+
* Support loading PaliGemma jax weights into SigLIP ViT models with average pooling.
95+
* Add Hiera models from Meta (https://github.com/facebookresearch/hiera).
96+
* Add `normalize=` flag for transorms, return non-normalized torch.Tensor with original dytpe (for `chug`)
97+
* Version 1.0.3 release
98+
99+
### May 11, 2024
100+
* `Searching for Better ViT Baselines (For the GPU Poor)` weights and vit variants released. Exploring model shapes between Tiny and Base.
101+
102+
| model | top1 | top5 | param_count | img_size |
103+
| -------------------------------------------------- | ------ | ------ | ----------- | -------- |
104+
| [vit_mediumd_patch16_reg4_gap_256.sbb_in12k_ft_in1k](https://huggingface.co/timm/vit_mediumd_patch16_reg4_gap_256.sbb_in12k_ft_in1k) | 86.202 | 97.874 | 64.11 | 256 |
105+
| [vit_betwixt_patch16_reg4_gap_256.sbb_in12k_ft_in1k](https://huggingface.co/timm/vit_betwixt_patch16_reg4_gap_256.sbb_in12k_ft_in1k) | 85.418 | 97.48 | 60.4 | 256 |
106+
| [vit_mediumd_patch16_rope_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_mediumd_patch16_rope_reg1_gap_256.sbb_in1k) | 84.322 | 96.812 | 63.95 | 256 |
107+
| [vit_betwixt_patch16_rope_reg4_gap_256.sbb_in1k](https://huggingface.co/timm/vit_betwixt_patch16_rope_reg4_gap_256.sbb_in1k) | 83.906 | 96.684 | 60.23 | 256 |
108+
| [vit_base_patch16_rope_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_base_patch16_rope_reg1_gap_256.sbb_in1k) | 83.866 | 96.67 | 86.43 | 256 |
109+
| [vit_medium_patch16_rope_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_medium_patch16_rope_reg1_gap_256.sbb_in1k) | 83.81 | 96.824 | 38.74 | 256 |
110+
| [vit_betwixt_patch16_reg4_gap_256.sbb_in1k](https://huggingface.co/timm/vit_betwixt_patch16_reg4_gap_256.sbb_in1k) | 83.706 | 96.616 | 60.4 | 256 |
111+
| [vit_betwixt_patch16_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_betwixt_patch16_reg1_gap_256.sbb_in1k) | 83.628 | 96.544 | 60.4 | 256 |
112+
| [vit_medium_patch16_reg4_gap_256.sbb_in1k](https://huggingface.co/timm/vit_medium_patch16_reg4_gap_256.sbb_in1k) | 83.47 | 96.622 | 38.88 | 256 |
113+
| [vit_medium_patch16_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_medium_patch16_reg1_gap_256.sbb_in1k) | 83.462 | 96.548 | 38.88 | 256 |
114+
| [vit_little_patch16_reg4_gap_256.sbb_in1k](https://huggingface.co/timm/vit_little_patch16_reg4_gap_256.sbb_in1k) | 82.514 | 96.262 | 22.52 | 256 |
115+
| [vit_wee_patch16_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_wee_patch16_reg1_gap_256.sbb_in1k) | 80.256 | 95.360 | 13.42 | 256 |
116+
| [vit_pwee_patch16_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_pwee_patch16_reg1_gap_256.sbb_in1k) | 80.072 | 95.136 | 15.25 | 256 |
117+
| [vit_mediumd_patch16_reg4_gap_256.sbb_in12k](https://huggingface.co/timm/vit_mediumd_patch16_reg4_gap_256.sbb_in12k) | N/A | N/A | 64.11 | 256 |
118+
| [vit_betwixt_patch16_reg4_gap_256.sbb_in12k](https://huggingface.co/timm/vit_betwixt_patch16_reg4_gap_256.sbb_in12k) | N/A | N/A | 60.4 | 256 |
119+
120+
* AttentionExtract helper added to extract attention maps from `timm` models. See example in https://github.com/huggingface/pytorch-image-models/discussions/1232#discussioncomment-9320949
121+
* `forward_intermediates()` API refined and added to more models including some ConvNets that have other extraction methods.
122+
* 1017 of 1047 model architectures support `features_only=True` feature extraction. Remaining 34 architectures can be supported but based on priority requests.
123+
* Remove torch.jit.script annotated functions including old JIT activations. Conflict with dynamo and dynamo does a much better job when used.
124+
125+
### April 11, 2024
126+
* Prepping for a long overdue 1.0 release, things have been stable for a while now.
127+
* Significant feature that's been missing for a while, `features_only=True` support for ViT models with flat hidden states or non-std module layouts (so far covering `'vit_*', 'twins_*', 'deit*', 'beit*', 'mvitv2*', 'eva*', 'samvit_*', 'flexivit*'`)
128+
* Above feature support achieved through a new `forward_intermediates()` API that can be used with a feature wrapping module or direclty.
129+
```python
130+
model = timm.create_model('vit_base_patch16_224')
131+
final_feat, intermediates = model.forward_intermediates(input)
132+
output = model.forward_head(final_feat) # pooling + classifier head
133+
134+
print(final_feat.shape)
135+
torch.Size([2, 197, 768])
136+
137+
for f in intermediates:
138+
print(f.shape)
139+
torch.Size([2, 768, 14, 14])
140+
torch.Size([2, 768, 14, 14])
141+
torch.Size([2, 768, 14, 14])
142+
torch.Size([2, 768, 14, 14])
143+
torch.Size([2, 768, 14, 14])
144+
torch.Size([2, 768, 14, 14])
145+
torch.Size([2, 768, 14, 14])
146+
torch.Size([2, 768, 14, 14])
147+
torch.Size([2, 768, 14, 14])
148+
torch.Size([2, 768, 14, 14])
149+
torch.Size([2, 768, 14, 14])
150+
torch.Size([2, 768, 14, 14])
151+
152+
print(output.shape)
153+
torch.Size([2, 1000])
154+
```
155+
156+
```python
157+
model = timm.create_model('eva02_base_patch16_clip_224', pretrained=True, img_size=512, features_only=True, out_indices=(-3, -2,))
158+
output = model(torch.randn(2, 3, 512, 512))
159+
160+
for o in output:
161+
print(o.shape)
162+
torch.Size([2, 768, 32, 32])
163+
torch.Size([2, 768, 32, 32])
164+
```
165+
* TinyCLIP vision tower weights added, thx [Thien Tran](https://github.com/gau-nernst)
166+
167+
### Feb 19, 2024
168+
* Next-ViT models added. Adapted from https://github.com/bytedance/Next-ViT
169+
* HGNet and PP-HGNetV2 models added. Adapted from https://github.com/PaddlePaddle/PaddleClas by [SeeFun](https://github.com/seefun)
170+
* Removed setup.py, moved to pyproject.toml based build supported by PDM
171+
* Add updated model EMA impl using _for_each for less overhead
172+
* Support device args in train script for non GPU devices
173+
* Other misc fixes and small additions
174+
* Min supported Python version increased to 3.8
175+
* Release 0.9.16
176+
177+
### Jan 8, 2024
178+
Datasets & transform refactoring
179+
* HuggingFace streaming (iterable) dataset support (`--dataset hfids:org/dataset`)
180+
* Webdataset wrapper tweaks for improved split info fetching, can auto fetch splits from supported HF hub webdataset
181+
* Tested HF `datasets` and webdataset wrapper streaming from HF hub with recent `timm` ImageNet uploads to https://huggingface.co/timm
182+
* Make input & target column/field keys consistent across datasets and pass via args
183+
* Full monochrome support when using e:g: `--input-size 1 224 224` or `--in-chans 1`, sets PIL image conversion appropriately in dataset
184+
* Improved several alternate crop & resize transforms (ResizeKeepRatio, RandomCropOrPad, etc) for use in PixParse document AI project
185+
* Add SimCLR style color jitter prob along with grayscale and gaussian blur options to augmentations and args
186+
* Allow train without validation set (`--val-split ''`) in train script
187+
* Add `--bce-sum` (sum over class dim) and `--bce-pos-weight` (positive weighting) args for training as they're common BCE loss tweaks I was often hard coding
188+
189+
### Nov 23, 2023
190+
* Added EfficientViT-Large models, thanks [SeeFun](https://github.com/seefun)
191+
* Fix Python 3.7 compat, will be dropping support for it soon
192+
* Other misc fixes
193+
* Release 0.9.12
194+
195+
### Nov 20, 2023
196+
* Added significant flexibility for Hugging Face Hub based timm models via `model_args` config entry. `model_args` will be passed as kwargs through to models on creation.
197+
* See example at https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m_ft_as20k/blob/main/config.json
198+
* Usage: https://github.com/huggingface/pytorch-image-models/discussions/2035
199+
* Updated imagenet eval and test set csv files with latest models
200+
* `vision_transformer.py` typing and doc cleanup by [Laureηt](https://github.com/Laurent2916)
201+
* 0.9.11 release
202+
203+
### Nov 3, 2023
204+
* [DFN (Data Filtering Networks)](https://huggingface.co/papers/2309.17425) and [MetaCLIP](https://huggingface.co/papers/2309.16671) ViT weights added
205+
* DINOv2 'register' ViT model weights added (https://huggingface.co/papers/2309.16588, https://huggingface.co/papers/2304.07193)
206+
* Add `quickgelu` ViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient)
207+
* Improved typing added to ResNet, MobileNet-v3 thanks to [Aryan](https://github.com/a-r-r-o-w)
208+
* ImageNet-12k fine-tuned (from LAION-2B CLIP) `convnext_xxlarge`
209+
* 0.9.9 release
210+
211+
### Oct 20, 2023
212+
* [SigLIP](https://huggingface.co/papers/2303.15343) image tower weights supported in `vision_transformer.py`.
213+
* Great potential for fine-tune and downstream feature use.
214+
* Experimental 'register' support in vit models as per [Vision Transformers Need Registers](https://huggingface.co/papers/2309.16588)
215+
* Updated RepViT with new weight release. Thanks [wangao](https://github.com/jameslahm)
216+
* Add patch resizing support (on pretrained weight load) to Swin models
217+
* 0.9.8 release pending
218+
219+
### Sep 1, 2023
220+
* TinyViT added by [SeeFun](https://github.com/seefun)
221+
* Fix EfficientViT (MIT) to use torch.autocast so it works back to PT 1.10
222+
* 0.9.7 release
2223

3224
### Aug 28, 2023
4225
* Add dynamic img size support to models in `vision_transformer.py`, `vision_transformer_hybrid.py`, `deit.py`, and `eva.py` w/o breaking backward compat.

timm/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = '1.0.8'
1+
__version__ = '1.0.9.dev0'

0 commit comments

Comments
 (0)