[Feature] Tensor + Sequence parallel test coverage for evo2 inference.

### Problem & Motivation

Recently issues were discovered in megatron inference related to tensor parallelism with `sequence_parallel=True`, which is typically the recommended way to run when using `--tensor-parallel-size=N` with `N>1` when requesting `materialize_only_last_token_logits=True`.

First off we do not have an argument for allowing `--sequence-parallel` so we should add that to `infer.py` as an option to for testing since at least at train time it is a boost in parallel efficiency for tensor parallelism. 

However at inference time this parameter may cause problems when `materialize_only_last_token_logits=True`. That case of `materialize_only_last_token_logits=True` seems to be recently set as the default, rather than `False` in megatron. 

Given how this may impact accuracy, and may require us to make a change to https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/llm/gpt/model/megatron/hyena/hyena_model.py#L382-L389 similar to https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/models/gpt/gpt_model.py#L581-L596, we should have multi-gpu test coverage for `--tensor-parallel-size=2` as well as `--sequence-parallel` once we add that to `infer.py`.

### BioNeMo Framework Version

b4c44884dd7942582bfab6b70fba0fc9e22cc108

### Category

Model/Training

### Proposed Solution

Add test coverage for multi-gpu generation. It should cover tp=2, cp=2, and pp=2 so we have documentation/knowledge of which kinds of parallelism we support. Use one of the inference accuracy tests in `test_evo2.py` and make sure that we do not degrade accuracy, for example maybe `test_batch_generate`. 

### Expected Benefits

Knowledge of when upstream changes break inference at multi-gpu scales.

### Code Example

```python

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Tensor + Sequence parallel test coverage for evo2 inference. #1243

Problem & Motivation

BioNeMo Framework Version

Category

Proposed Solution

Expected Benefits

Code Example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Tensor + Sequence parallel test coverage for evo2 inference. #1243

Description

Problem & Motivation

BioNeMo Framework Version

Category

Proposed Solution

Expected Benefits

Code Example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions