-
Notifications
You must be signed in to change notification settings - Fork 31
Add support for gradient clipping. #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/build_and_test |
|
🚀 Build workflow triggered! View run |
|
✅ Build workflow passed! View run |
romerojosh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this feature!
Left a small comment about SAC grad clipping. There are also some merge conflicts that need to be resolved after merging in #93.
| if (alpha_lr_scheduler) { | ||
| alpha_lr_scheduler->step(); | ||
| // I think we do not need grad clipping here | ||
| // the model is just a scalar and clipping the grad for that might hurt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noted this in the previous version of this MR. What does stable-baselines do in this case (if they support grad clipping)? We should match that as a reference if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so stable baselines does not support gradient accumulation nor clipping except for PPO.
However, I think gradient accumulation makes sense in general for torchfort because of the online nature. That way we can have larger batch sizes without using a replay buffer.
Concerning grad clipping, I mean it's optional. I do not see why it should hurt if you have the support for it. We can add it here as well.
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
f510a76 to
cc498c8
Compare
|
/build_and_test |
|
🚀 Build workflow triggered! View run |
|
✅ Build workflow passed! View run |
This MR adds gradient clipping to all training procedures.