Skip to content

Conversation

@wkcn
Copy link
Contributor

@wkcn wkcn commented Dec 7, 2023

Description
Support for auto scaling factor tuning #41
Related Example: Azure/MS-AMP-Examples#21

Performance (model: GPT-345M, https://github.com/Azure/MS-AMP-Examples/blob/main/gpt3/pretrain_345m_megatron.sh):

  • msamp w/o auto scaling
    validation loss at iteration 5000 | lm loss value: 3.531525E+00 | lm loss PPL: 3.417605E+01 |
    samples per second: 519.524 | TFLOPs: 155.99 |

  • msamp w/ auto scaling (Add the argument --wgrad-auto-scaling):
    validation loss at iteration 5000 | lm loss value: 3.529646E+00 | lm loss PPL: 3.411188E+01 |
    samples per second: 516.702 | TFLOPs: 155.14 |

Major Revision

  • Add a new variable pre_scale in ScalingMeta
  • pre_scale support in Arithmetic.add_to_fp8
  • Auto scaling factor tuning in megatron FP8DistributedOptimizer
  • unittests

@wkcn wkcn marked this pull request as draft December 7, 2023 03:34
@wkcn wkcn marked this pull request as ready for review December 11, 2023 03:10
@wkcn wkcn requested review from guoshzhao and tocean December 12, 2023 06:47
@wkcn wkcn enabled auto-merge (squash) December 12, 2023 06:51
@wkcn wkcn changed the title Auto scaling factor tuning for FP8 collective communication [Feature] Auto scaling factor tuning for FP8 collective communication Dec 14, 2023
@wkcn wkcn mentioned this pull request Dec 14, 2023
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant