Vision Transformer Model Enhancements

After merging #80 more optimization can possibly be applied:

- Evaluate if the attn_mask (mask out padded inputs and cls token) is necessary for training
  - If yes, then try to use https://pytorch.org/blog/flexattention/ 
  - If no, remove the mask during training and benefit from flash attention
- Evaluate if we can change the feed-forward dimension back to 512 (like in torch.v1)
- Try to implement `torch.compile` for deployment (probably not working due to variable input shapes) and for preprocessing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vision Transformer Model Enhancements #82

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vision Transformer Model Enhancements #82

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions