|
1 | | -# Recent Changes |
| 1 | +# Changelog |
| 2 | + |
| 3 | +### Aug 8, 2024 |
| 4 | +* Add RDNet ('DenseNets Reloaded', https://arxiv.org/abs/2403.19588), thanks [Donghyun Kim](https://github.com/dhkim0225) |
| 5 | + |
| 6 | +### July 28, 2024 |
| 7 | +* Add `mobilenet_edgetpu_v2_m` weights w/ `ra4` mnv4-small based recipe. 80.1% top-1 @ 224 and 80.7 @ 256. |
| 8 | +* Release 1.0.8 |
| 9 | + |
| 10 | +### July 26, 2024 |
| 11 | +* More MobileNet-v4 weights, ImageNet-12k pretrain w/ fine-tunes, and anti-aliased ConvLarge models |
| 12 | + |
| 13 | +| model |top1 |top1_err|top5 |top5_err|param_count|img_size| |
| 14 | +|--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------| |
| 15 | +| [mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k](http://hf.co/timm/mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k)|84.99 |15.01 |97.294|2.706 |32.59 |544 | |
| 16 | +| [mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k](http://hf.co/timm/mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k)|84.772|15.228 |97.344|2.656 |32.59 |480 | |
| 17 | +| [mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k](http://hf.co/timm/mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k)|84.64 |15.36 |97.114|2.886 |32.59 |448 | |
| 18 | +| [mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k](http://hf.co/timm/mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k)|84.314|15.686 |97.102|2.898 |32.59 |384 | |
| 19 | +| [mobilenetv4_conv_aa_large.e600_r384_in1k](http://hf.co/timm/mobilenetv4_conv_aa_large.e600_r384_in1k) |83.824|16.176 |96.734|3.266 |32.59 |480 | |
| 20 | +| [mobilenetv4_conv_aa_large.e600_r384_in1k](http://hf.co/timm/mobilenetv4_conv_aa_large.e600_r384_in1k) |83.244|16.756 |96.392|3.608 |32.59 |384 | |
| 21 | +| [mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k)|82.99 |17.01 |96.67 |3.33 |11.07 |320 | |
| 22 | +| [mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k)|82.364|17.636 |96.256|3.744 |11.07 |256 | |
| 23 | + |
| 24 | +* Impressive MobileNet-V1 and EfficientNet-B0 baseline challenges (https://huggingface.co/blog/rwightman/mobilenet-baselines) |
| 25 | + |
| 26 | +| model |top1 |top1_err|top5 |top5_err|param_count|img_size| |
| 27 | +|--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------| |
| 28 | +| [efficientnet_b0.ra4_e3600_r224_in1k](http://hf.co/timm/efficientnet_b0.ra4_e3600_r224_in1k) |79.364|20.636 |94.754|5.246 |5.29 |256 | |
| 29 | +| [efficientnet_b0.ra4_e3600_r224_in1k](http://hf.co/timm/efficientnet_b0.ra4_e3600_r224_in1k) |78.584|21.416 |94.338|5.662 |5.29 |224 | |
| 30 | +| [mobilenetv1_100h.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv1_100h.ra4_e3600_r224_in1k) |76.596|23.404 |93.272|6.728 |5.28 |256 | |
| 31 | +| [mobilenetv1_100.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv1_100.ra4_e3600_r224_in1k) |76.094|23.906 |93.004|6.996 |4.23 |256 | |
| 32 | +| [mobilenetv1_100h.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv1_100h.ra4_e3600_r224_in1k) |75.662|24.338 |92.504|7.496 |5.28 |224 | |
| 33 | +| [mobilenetv1_100.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv1_100.ra4_e3600_r224_in1k) |75.382|24.618 |92.312|7.688 |4.23 |224 | |
| 34 | + |
| 35 | +* Prototype of `set_input_size()` added to vit and swin v1/v2 models to allow changing image size, patch size, window size after model creation. |
| 36 | +* Improved support in swin for different size handling, in addition to `set_input_size`, `always_partition` and `strict_img_size` args have been added to `__init__` to allow more flexible input size constraints |
| 37 | +* Fix out of order indices info for intermediate 'Getter' feature wrapper, check out or range indices for same. |
| 38 | +* Add several `tiny` < .5M param models for testing that are actually trained on ImageNet-1k |
| 39 | + |
| 40 | +|model |top1 |top1_err|top5 |top5_err|param_count|img_size|crop_pct| |
| 41 | +|----------------------------|------|--------|------|--------|-----------|--------|--------| |
| 42 | +|test_efficientnet.r160_in1k |47.156|52.844 |71.726|28.274 |0.36 |192 |1.0 | |
| 43 | +|test_byobnet.r160_in1k |46.698|53.302 |71.674|28.326 |0.46 |192 |1.0 | |
| 44 | +|test_efficientnet.r160_in1k |46.426|53.574 |70.928|29.072 |0.36 |160 |0.875 | |
| 45 | +|test_byobnet.r160_in1k |45.378|54.622 |70.572|29.428 |0.46 |160 |0.875 | |
| 46 | +|test_vit.r160_in1k|42.0 |58.0 |68.664|31.336 |0.37 |192 |1.0 | |
| 47 | +|test_vit.r160_in1k|40.822|59.178 |67.212|32.788 |0.37 |160 |0.875 | |
| 48 | + |
| 49 | +* Fix vit reg token init, thanks [Promisery](https://github.com/Promisery) |
| 50 | +* Other misc fixes |
| 51 | + |
| 52 | +### June 24, 2024 |
| 53 | +* 3 more MobileNetV4 hyrid weights with different MQA weight init scheme |
| 54 | + |
| 55 | +| model |top1 |top1_err|top5 |top5_err|param_count|img_size| |
| 56 | +|--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------| |
| 57 | +| [mobilenetv4_hybrid_large.ix_e600_r384_in1k](http://hf.co/timm/mobilenetv4_hybrid_large.ix_e600_r384_in1k) |84.356|15.644 |96.892 |3.108 |37.76 |448 | |
| 58 | +| [mobilenetv4_hybrid_large.ix_e600_r384_in1k](http://hf.co/timm/mobilenetv4_hybrid_large.ix_e600_r384_in1k) |83.990|16.010 |96.702 |3.298 |37.76 |384 | |
| 59 | +| [mobilenetv4_hybrid_medium.ix_e550_r384_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.ix_e550_r384_in1k) |83.394|16.606 |96.760|3.240 |11.07 |448 | |
| 60 | +| [mobilenetv4_hybrid_medium.ix_e550_r384_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.ix_e550_r384_in1k) |82.968|17.032 |96.474|3.526 |11.07 |384 | |
| 61 | +| [mobilenetv4_hybrid_medium.ix_e550_r256_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.ix_e550_r256_in1k) |82.492|17.508 |96.278|3.722 |11.07 |320 | |
| 62 | +| [mobilenetv4_hybrid_medium.ix_e550_r256_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.ix_e550_r256_in1k) |81.446|18.554 |95.704|4.296 |11.07 |256 | |
| 63 | +* florence2 weight loading in DaViT model |
| 64 | + |
| 65 | +### June 12, 2024 |
| 66 | +* MobileNetV4 models and initial set of `timm` trained weights added: |
| 67 | + |
| 68 | +| model |top1 |top1_err|top5 |top5_err|param_count|img_size| |
| 69 | +|--------------------------------------------------------------------------------------------------|------|--------|------|--------|-----------|--------| |
| 70 | +| [mobilenetv4_hybrid_large.e600_r384_in1k](http://hf.co/timm/mobilenetv4_hybrid_large.e600_r384_in1k) |84.266|15.734 |96.936 |3.064 |37.76 |448 | |
| 71 | +| [mobilenetv4_hybrid_large.e600_r384_in1k](http://hf.co/timm/mobilenetv4_hybrid_large.e600_r384_in1k) |83.800|16.200 |96.770 |3.230 |37.76 |384 | |
| 72 | +| [mobilenetv4_conv_large.e600_r384_in1k](http://hf.co/timm/mobilenetv4_conv_large.e600_r384_in1k) |83.392|16.608 |96.622 |3.378 |32.59 |448 | |
| 73 | +| [mobilenetv4_conv_large.e600_r384_in1k](http://hf.co/timm/mobilenetv4_conv_large.e600_r384_in1k) |82.952|17.048 |96.266 |3.734 |32.59 |384 | |
| 74 | +| [mobilenetv4_conv_large.e500_r256_in1k](http://hf.co/timm/mobilenetv4_conv_large.e500_r256_in1k) |82.674|17.326 |96.31 |3.69 |32.59 |320 | |
| 75 | +| [mobilenetv4_conv_large.e500_r256_in1k](http://hf.co/timm/mobilenetv4_conv_large.e500_r256_in1k) |81.862|18.138 |95.69 |4.31 |32.59 |256 | |
| 76 | +| [mobilenetv4_hybrid_medium.e500_r224_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.e500_r224_in1k) |81.276|18.724 |95.742|4.258 |11.07 |256 | |
| 77 | +| [mobilenetv4_conv_medium.e500_r256_in1k](http://hf.co/timm/mobilenetv4_conv_medium.e500_r256_in1k) |80.858|19.142 |95.768|4.232 |9.72 |320 | |
| 78 | +| [mobilenetv4_hybrid_medium.e500_r224_in1k](http://hf.co/timm/mobilenetv4_hybrid_medium.e500_r224_in1k) |80.442|19.558 |95.38 |4.62 |11.07 |224 | |
| 79 | +| [mobilenetv4_conv_blur_medium.e500_r224_in1k](http://hf.co/timm/mobilenetv4_conv_blur_medium.e500_r224_in1k) |80.142|19.858 |95.298|4.702 |9.72 |256 | |
| 80 | +| [mobilenetv4_conv_medium.e500_r256_in1k](http://hf.co/timm/mobilenetv4_conv_medium.e500_r256_in1k) |79.928|20.072 |95.184|4.816 |9.72 |256 | |
| 81 | +| [mobilenetv4_conv_medium.e500_r224_in1k](http://hf.co/timm/mobilenetv4_conv_medium.e500_r224_in1k) |79.808|20.192 |95.186|4.814 |9.72 |256 | |
| 82 | +| [mobilenetv4_conv_blur_medium.e500_r224_in1k](http://hf.co/timm/mobilenetv4_conv_blur_medium.e500_r224_in1k) |79.438|20.562 |94.932|5.068 |9.72 |224 | |
| 83 | +| [mobilenetv4_conv_medium.e500_r224_in1k](http://hf.co/timm/mobilenetv4_conv_medium.e500_r224_in1k) |79.094|20.906 |94.77 |5.23 |9.72 |224 | |
| 84 | +| [mobilenetv4_conv_small.e2400_r224_in1k](http://hf.co/timm/mobilenetv4_conv_small.e2400_r224_in1k) |74.616|25.384 |92.072|7.928 |3.77 |256 | |
| 85 | +| [mobilenetv4_conv_small.e1200_r224_in1k](http://hf.co/timm/mobilenetv4_conv_small.e1200_r224_in1k) |74.292|25.708 |92.116|7.884 |3.77 |256 | |
| 86 | +| [mobilenetv4_conv_small.e2400_r224_in1k](http://hf.co/timm/mobilenetv4_conv_small.e2400_r224_in1k) |73.756|26.244 |91.422|8.578 |3.77 |224 | |
| 87 | +| [mobilenetv4_conv_small.e1200_r224_in1k](http://hf.co/timm/mobilenetv4_conv_small.e1200_r224_in1k) |73.454|26.546 |91.34 |8.66 |3.77 |224 | |
| 88 | + |
| 89 | +* Apple MobileCLIP (https://arxiv.org/pdf/2311.17049, FastViT and ViT-B) image tower model support & weights added (part of OpenCLIP support). |
| 90 | +* ViTamin (https://arxiv.org/abs/2404.02132) CLIP image tower model & weights added (part of OpenCLIP support). |
| 91 | +* OpenAI CLIP Modified ResNet image tower modelling & weight support (via ByobNet). Refactor AttentionPool2d. |
| 92 | + |
| 93 | +### May 14, 2024 |
| 94 | +* Support loading PaliGemma jax weights into SigLIP ViT models with average pooling. |
| 95 | +* Add Hiera models from Meta (https://github.com/facebookresearch/hiera). |
| 96 | +* Add `normalize=` flag for transorms, return non-normalized torch.Tensor with original dytpe (for `chug`) |
| 97 | +* Version 1.0.3 release |
| 98 | + |
| 99 | +### May 11, 2024 |
| 100 | +* `Searching for Better ViT Baselines (For the GPU Poor)` weights and vit variants released. Exploring model shapes between Tiny and Base. |
| 101 | + |
| 102 | +| model | top1 | top5 | param_count | img_size | |
| 103 | +| -------------------------------------------------- | ------ | ------ | ----------- | -------- | |
| 104 | +| [vit_mediumd_patch16_reg4_gap_256.sbb_in12k_ft_in1k](https://huggingface.co/timm/vit_mediumd_patch16_reg4_gap_256.sbb_in12k_ft_in1k) | 86.202 | 97.874 | 64.11 | 256 | |
| 105 | +| [vit_betwixt_patch16_reg4_gap_256.sbb_in12k_ft_in1k](https://huggingface.co/timm/vit_betwixt_patch16_reg4_gap_256.sbb_in12k_ft_in1k) | 85.418 | 97.48 | 60.4 | 256 | |
| 106 | +| [vit_mediumd_patch16_rope_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_mediumd_patch16_rope_reg1_gap_256.sbb_in1k) | 84.322 | 96.812 | 63.95 | 256 | |
| 107 | +| [vit_betwixt_patch16_rope_reg4_gap_256.sbb_in1k](https://huggingface.co/timm/vit_betwixt_patch16_rope_reg4_gap_256.sbb_in1k) | 83.906 | 96.684 | 60.23 | 256 | |
| 108 | +| [vit_base_patch16_rope_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_base_patch16_rope_reg1_gap_256.sbb_in1k) | 83.866 | 96.67 | 86.43 | 256 | |
| 109 | +| [vit_medium_patch16_rope_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_medium_patch16_rope_reg1_gap_256.sbb_in1k) | 83.81 | 96.824 | 38.74 | 256 | |
| 110 | +| [vit_betwixt_patch16_reg4_gap_256.sbb_in1k](https://huggingface.co/timm/vit_betwixt_patch16_reg4_gap_256.sbb_in1k) | 83.706 | 96.616 | 60.4 | 256 | |
| 111 | +| [vit_betwixt_patch16_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_betwixt_patch16_reg1_gap_256.sbb_in1k) | 83.628 | 96.544 | 60.4 | 256 | |
| 112 | +| [vit_medium_patch16_reg4_gap_256.sbb_in1k](https://huggingface.co/timm/vit_medium_patch16_reg4_gap_256.sbb_in1k) | 83.47 | 96.622 | 38.88 | 256 | |
| 113 | +| [vit_medium_patch16_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_medium_patch16_reg1_gap_256.sbb_in1k) | 83.462 | 96.548 | 38.88 | 256 | |
| 114 | +| [vit_little_patch16_reg4_gap_256.sbb_in1k](https://huggingface.co/timm/vit_little_patch16_reg4_gap_256.sbb_in1k) | 82.514 | 96.262 | 22.52 | 256 | |
| 115 | +| [vit_wee_patch16_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_wee_patch16_reg1_gap_256.sbb_in1k) | 80.256 | 95.360 | 13.42 | 256 | |
| 116 | +| [vit_pwee_patch16_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_pwee_patch16_reg1_gap_256.sbb_in1k) | 80.072 | 95.136 | 15.25 | 256 | |
| 117 | +| [vit_mediumd_patch16_reg4_gap_256.sbb_in12k](https://huggingface.co/timm/vit_mediumd_patch16_reg4_gap_256.sbb_in12k) | N/A | N/A | 64.11 | 256 | |
| 118 | +| [vit_betwixt_patch16_reg4_gap_256.sbb_in12k](https://huggingface.co/timm/vit_betwixt_patch16_reg4_gap_256.sbb_in12k) | N/A | N/A | 60.4 | 256 | |
| 119 | + |
| 120 | +* AttentionExtract helper added to extract attention maps from `timm` models. See example in https://github.com/huggingface/pytorch-image-models/discussions/1232#discussioncomment-9320949 |
| 121 | +* `forward_intermediates()` API refined and added to more models including some ConvNets that have other extraction methods. |
| 122 | +* 1017 of 1047 model architectures support `features_only=True` feature extraction. Remaining 34 architectures can be supported but based on priority requests. |
| 123 | +* Remove torch.jit.script annotated functions including old JIT activations. Conflict with dynamo and dynamo does a much better job when used. |
| 124 | + |
| 125 | +### April 11, 2024 |
| 126 | +* Prepping for a long overdue 1.0 release, things have been stable for a while now. |
| 127 | +* Significant feature that's been missing for a while, `features_only=True` support for ViT models with flat hidden states or non-std module layouts (so far covering `'vit_*', 'twins_*', 'deit*', 'beit*', 'mvitv2*', 'eva*', 'samvit_*', 'flexivit*'`) |
| 128 | +* Above feature support achieved through a new `forward_intermediates()` API that can be used with a feature wrapping module or direclty. |
| 129 | +```python |
| 130 | +model = timm.create_model('vit_base_patch16_224') |
| 131 | +final_feat, intermediates = model.forward_intermediates(input) |
| 132 | +output = model.forward_head(final_feat) # pooling + classifier head |
| 133 | + |
| 134 | +print(final_feat.shape) |
| 135 | +torch.Size([2, 197, 768]) |
| 136 | + |
| 137 | +for f in intermediates: |
| 138 | + print(f.shape) |
| 139 | +torch.Size([2, 768, 14, 14]) |
| 140 | +torch.Size([2, 768, 14, 14]) |
| 141 | +torch.Size([2, 768, 14, 14]) |
| 142 | +torch.Size([2, 768, 14, 14]) |
| 143 | +torch.Size([2, 768, 14, 14]) |
| 144 | +torch.Size([2, 768, 14, 14]) |
| 145 | +torch.Size([2, 768, 14, 14]) |
| 146 | +torch.Size([2, 768, 14, 14]) |
| 147 | +torch.Size([2, 768, 14, 14]) |
| 148 | +torch.Size([2, 768, 14, 14]) |
| 149 | +torch.Size([2, 768, 14, 14]) |
| 150 | +torch.Size([2, 768, 14, 14]) |
| 151 | + |
| 152 | +print(output.shape) |
| 153 | +torch.Size([2, 1000]) |
| 154 | +``` |
| 155 | + |
| 156 | +```python |
| 157 | +model = timm.create_model('eva02_base_patch16_clip_224', pretrained=True, img_size=512, features_only=True, out_indices=(-3, -2,)) |
| 158 | +output = model(torch.randn(2, 3, 512, 512)) |
| 159 | + |
| 160 | +for o in output: |
| 161 | + print(o.shape) |
| 162 | +torch.Size([2, 768, 32, 32]) |
| 163 | +torch.Size([2, 768, 32, 32]) |
| 164 | +``` |
| 165 | +* TinyCLIP vision tower weights added, thx [Thien Tran](https://github.com/gau-nernst) |
| 166 | + |
| 167 | +### Feb 19, 2024 |
| 168 | +* Next-ViT models added. Adapted from https://github.com/bytedance/Next-ViT |
| 169 | +* HGNet and PP-HGNetV2 models added. Adapted from https://github.com/PaddlePaddle/PaddleClas by [SeeFun](https://github.com/seefun) |
| 170 | +* Removed setup.py, moved to pyproject.toml based build supported by PDM |
| 171 | +* Add updated model EMA impl using _for_each for less overhead |
| 172 | +* Support device args in train script for non GPU devices |
| 173 | +* Other misc fixes and small additions |
| 174 | +* Min supported Python version increased to 3.8 |
| 175 | +* Release 0.9.16 |
| 176 | + |
| 177 | +### Jan 8, 2024 |
| 178 | +Datasets & transform refactoring |
| 179 | +* HuggingFace streaming (iterable) dataset support (`--dataset hfids:org/dataset`) |
| 180 | +* Webdataset wrapper tweaks for improved split info fetching, can auto fetch splits from supported HF hub webdataset |
| 181 | +* Tested HF `datasets` and webdataset wrapper streaming from HF hub with recent `timm` ImageNet uploads to https://huggingface.co/timm |
| 182 | +* Make input & target column/field keys consistent across datasets and pass via args |
| 183 | +* Full monochrome support when using e:g: `--input-size 1 224 224` or `--in-chans 1`, sets PIL image conversion appropriately in dataset |
| 184 | +* Improved several alternate crop & resize transforms (ResizeKeepRatio, RandomCropOrPad, etc) for use in PixParse document AI project |
| 185 | +* Add SimCLR style color jitter prob along with grayscale and gaussian blur options to augmentations and args |
| 186 | +* Allow train without validation set (`--val-split ''`) in train script |
| 187 | +* Add `--bce-sum` (sum over class dim) and `--bce-pos-weight` (positive weighting) args for training as they're common BCE loss tweaks I was often hard coding |
| 188 | + |
| 189 | +### Nov 23, 2023 |
| 190 | +* Added EfficientViT-Large models, thanks [SeeFun](https://github.com/seefun) |
| 191 | +* Fix Python 3.7 compat, will be dropping support for it soon |
| 192 | +* Other misc fixes |
| 193 | +* Release 0.9.12 |
| 194 | + |
| 195 | +### Nov 20, 2023 |
| 196 | +* Added significant flexibility for Hugging Face Hub based timm models via `model_args` config entry. `model_args` will be passed as kwargs through to models on creation. |
| 197 | + * See example at https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m_ft_as20k/blob/main/config.json |
| 198 | + * Usage: https://github.com/huggingface/pytorch-image-models/discussions/2035 |
| 199 | +* Updated imagenet eval and test set csv files with latest models |
| 200 | +* `vision_transformer.py` typing and doc cleanup by [Laureηt](https://github.com/Laurent2916) |
| 201 | +* 0.9.11 release |
| 202 | + |
| 203 | +### Nov 3, 2023 |
| 204 | +* [DFN (Data Filtering Networks)](https://huggingface.co/papers/2309.17425) and [MetaCLIP](https://huggingface.co/papers/2309.16671) ViT weights added |
| 205 | +* DINOv2 'register' ViT model weights added (https://huggingface.co/papers/2309.16588, https://huggingface.co/papers/2304.07193) |
| 206 | +* Add `quickgelu` ViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient) |
| 207 | +* Improved typing added to ResNet, MobileNet-v3 thanks to [Aryan](https://github.com/a-r-r-o-w) |
| 208 | +* ImageNet-12k fine-tuned (from LAION-2B CLIP) `convnext_xxlarge` |
| 209 | +* 0.9.9 release |
| 210 | + |
| 211 | +### Oct 20, 2023 |
| 212 | +* [SigLIP](https://huggingface.co/papers/2303.15343) image tower weights supported in `vision_transformer.py`. |
| 213 | + * Great potential for fine-tune and downstream feature use. |
| 214 | +* Experimental 'register' support in vit models as per [Vision Transformers Need Registers](https://huggingface.co/papers/2309.16588) |
| 215 | +* Updated RepViT with new weight release. Thanks [wangao](https://github.com/jameslahm) |
| 216 | +* Add patch resizing support (on pretrained weight load) to Swin models |
| 217 | +* 0.9.8 release pending |
| 218 | + |
| 219 | +### Sep 1, 2023 |
| 220 | +* TinyViT added by [SeeFun](https://github.com/seefun) |
| 221 | +* Fix EfficientViT (MIT) to use torch.autocast so it works back to PT 1.10 |
| 222 | +* 0.9.7 release |
2 | 223 |
|
3 | 224 | ### Aug 28, 2023 |
4 | 225 | * Add dynamic img size support to models in `vision_transformer.py`, `vision_transformer_hybrid.py`, `deit.py`, and `eva.py` w/o breaking backward compat. |
|
0 commit comments