-
Notifications
You must be signed in to change notification settings - Fork 46
Open
Labels
enhancementNew feature or requestNew feature or request
Description
After merging #80 more optimization can possibly be applied:
- Evaluate if the attn_mask (mask out padded inputs and cls token) is necessary for training
- If yes, then try to use https://pytorch.org/blog/flexattention/
- If no, remove the mask during training and benefit from flash attention
- Evaluate if we can change the feed-forward dimension back to 512 (like in torch.v1)
- Try to implement
torch.compilefor deployment (probably not working due to variable input shapes) and for preprocessing
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request