Skip to content

[Feature] Tensor + Sequence parallel test coverage for evo2 inference. #1243

@jstjohn

Description

@jstjohn

Problem & Motivation

Recently issues were discovered in megatron inference related to tensor parallelism with sequence_parallel=True, which is typically the recommended way to run when using --tensor-parallel-size=N with N>1 when requesting materialize_only_last_token_logits=True.

First off we do not have an argument for allowing --sequence-parallel so we should add that to infer.py as an option to for testing since at least at train time it is a boost in parallel efficiency for tensor parallelism.

However at inference time this parameter may cause problems when materialize_only_last_token_logits=True. That case of materialize_only_last_token_logits=True seems to be recently set as the default, rather than False in megatron.

Given how this may impact accuracy, and may require us to make a change to https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/llm/gpt/model/megatron/hyena/hyena_model.py#L382-L389 similar to https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/models/gpt/gpt_model.py#L581-L596, we should have multi-gpu test coverage for --tensor-parallel-size=2 as well as --sequence-parallel once we add that to infer.py.

BioNeMo Framework Version

b4c4488

Category

Model/Training

Proposed Solution

Add test coverage for multi-gpu generation. It should cover tp=2, cp=2, and pp=2 so we have documentation/knowledge of which kinds of parallelism we support. Use one of the inference accuracy tests in test_evo2.py and make sure that we do not degrade accuracy, for example maybe test_batch_generate.

Expected Benefits

Knowledge of when upstream changes break inference at multi-gpu scales.

Code Example

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions