Skip to content

Conversation

@muellerzr
Copy link

@muellerzr muellerzr commented Aug 15, 2024

Description
This PR makes it easier for users to use FSDP with MS-AMP from their existing optimizers. This is especially beneficial for library authors, as currently we need to go through quite a bit to get the FSDP version of these optimizers working when a user passes in optim.Adam.

Instead we delegate the FSDPAdamW to an OptimWrapper, which calls an underlying optimizer as a passthrough. This lets us add in any logic that should be done before/after said logic easier, and it takes in a constructed Optimizer rather than being inherited.

Let me know what we think about this, currently I'm going through integrating FSDP and DeepSpeed w/ MS-AMP into Accelerate and found this to be a critical painpoint, as our users pass in normal PyTorch optimizers and don't create special versions themselves.

@tocean @wkcn let me know what you two think :)

New working FSDP:

model, optimizer = ...
model, optimizer = msamp.initialize(model, optimizer, use_fsdp=True, weight_qtype=Dtypes.kfloat8_e4m3)
model = FP8FullyShardedDataParallel(model, use_orig_params=True, auto_wrap_policy=my_auto_wrap_policy)
optimizer = FSDPAdamW(optimizer)

@muellerzr
Copy link
Author

@microsoft-github-policy-service agree company="Hugging Face"

@muellerzr muellerzr marked this pull request as draft August 15, 2024 15:41
@muellerzr
Copy link
Author

muellerzr commented Aug 15, 2024

Sorry for the extraneous pushes while I was figuring something out. Good to go now :)

@muellerzr muellerzr marked this pull request as ready for review August 15, 2024 16:23
@muellerzr
Copy link
Author

You can see our new accelerate benchmarking scripts here: https://github.com/huggingface/accelerate/tree/muellerzr-msamp-ds-fsdp/benchmarks/fp8/ms_amp

@muellerzr
Copy link
Author

@tocean @wkcn any particular issues with this? :)

(Ideally it'd be great to include this in the next accelerate release on the 1st :) )

@wkcn
Copy link
Contributor

wkcn commented Aug 22, 2024

@muellerzr Thanks for your contribution!

The PR looks good to me.
Sorry that I am not at Microsoft and do not have the authorization to review and merge the pull request.

@muellerzr
Copy link
Author

muellerzr commented Aug 22, 2024

Ack okay, I suppose we'll have to wait for @tocean /@abuccts /@guoshzhao to take a look. Thanks for the flag 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants