-
Notifications
You must be signed in to change notification settings - Fork 104
feat: Self-Rewarding Algorithm with TRT Support #321
base: main
Are you sure you want to change the base?
Changes from all commits
92c19f6
7088f54
472a56c
032bf35
d5f55f5
c7cdca1
04d02c8
56ccacf
d23865f
3c21c81
2c99dcb
9e8526d
c1daeb9
e16c357
410eaf5
e405432
07cfa67
b2dfee0
63cd6b3
6901348
74a0bb1
b6a05fd
db2701b
8dd5c59
b5d6f88
5464827
00e4298
fe6864b
80579ec
f81f55a
4a034f4
66b5a54
7841381
1779c51
e357ef9
6b606e8
a669837
ada5f45
393acc6
b6a4d59
977e6e7
621718d
6109b8b
866c22b
02aa2b8
e6f27c5
c689d2a
cd4aaa5
993e358
28fcaf3
ef347e5
752d0bd
78e6536
4de3eeb
8a39881
666e969
3bec1bc
3e7ca5f
12a0aae
3956b6d
e090663
af83947
cc03b76
605bda1
b3dedfd
9fb90ff
bf62bcc
aea50ad
0a9416e
41ffeb5
e3d85bf
1225041
5315fc5
9c6141b
2cef93e
f5bd8c5
cbd7095
c72e4c1
220cf17
1ed3c46
aa4249d
22615b7
95a9507
0fdb124
031287c
fa37930
c83a98d
a9f722a
15460d2
f6c09ea
1d85152
b3078c5
1eb0823
6203e90
e51c45f
1b220f1
0815b56
8d75cbf
ab4c549
82fb3a1
12f85d2
f90f4d6
527557d
fe9d288
f3124d3
efadcae
35a2895
4487932
7e7f27b
e9c7b39
c6f6da4
ebb69f4
2775e81
fe0399f
ffa253f
7ca9e34
8181168
4d0853d
ec548b8
ce7a07f
bb2fc48
606f690
f48dc29
708bc24
48ad685
d053475
0b4a92d
b72a5ec
984acaa
c11e1d7
14a9926
7c2fc3e
fe02867
02ad2fa
24c53be
8a25e5e
24f138a
9c72c53
5ed9cd8
8b6627a
56032c8
1e17f8b
e0a94d0
280ad36
3af712d
835b3b3
2b95331
261269a
83ba660
f3912e7
09d2783
5105ed9
a2bf8a0
d9d45d6
f00d09e
830e599
09c357c
c41cc08
ab8c97b
92fec51
7f7d4f9
395cf04
3a8da14
4174248
36d8ab4
82e8793
7506be0
c8b88c3
1d1f051
aae1cd3
b4afcf6
dab8c61
5651e20
ff3bc1c
aadc662
314c217
2fdb5b0
31f4ba3
7dd511c
11d22e8
de8dda8
4f81524
723f55f
acd4d07
a249a44
ce687ed
005702b
a35e359
fc1def7
eab62af
8805db5
fb61b86
a7817b3
71029d9
9051695
35f8ee4
ab2d3ce
294afcd
c465f30
2092084
99c7bcd
4e435f7
3d429fb
b460a14
780e8ab
cc487fb
83e830a
5b7aae3
a1f9620
01aced0
224de3d
82ff16d
c608520
e4d36b6
34e4994
4314347
92c6ee3
260eb90
f93acab
e4b1712
e18d2fc
d646801
de7b8aa
449bee7
0e35eb4
7178c88
b93549f
0b04388
0b06d95
a5c9614
cf58ead
bbaf99f
139d739
1794fc1
5e163ef
deb3604
58a9c4c
d895625
f4179c5
8239056
8925c02
6365f73
8c9d4f7
cfe86fa
18f17f5
d3e55ee
bfc81c6
8fa17bb
24e857a
de145a5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,235 @@ | ||
| .. include:: /content/nemo.rsts | ||
|
|
||
| Model Generation with Data Parallelism and TRT | ||
| @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | ||
|
|
||
| The NeMo framework supports efficient model generation via the NeMo Aligner codebase. | ||
|
|
||
| All algorithms in NeMo Aligner are compatible with any GPT-based model from Megatron Core (i.e., those with mcore_gpt=True in the configuration). For this tutorial, we will demonstrate the generation pipeline using a 2B GPT model with 4096 sequence length <https://huggingface.co/nvidia/GPT-2B-001>__. This tutorial is also applicable to other GPT models, such as Llama models, regardless of their size. | ||
|
Comment on lines
+6
to
+8
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggest a revision to the introductory text, including the purpose and other copyedits This tutorial demonstrates efficient model generation using NeMo Framework and the NeMo-Aligner codebase. It shows how to set up a 2B GPT model with a sequence length of 4096, available on The tutorial covers obtaining and preparing a pretrained model, configuring parameters, and running the generation process. It highlights using aligned models for better outputs and provides steps for terminal and Slurm execution, ensuring efficient data parallelism and handling TransformerEngine issues. All NeMo-Aligner algorithms work with any GPT-based model from Megatron Core. |
||
|
|
||
| Obtaining a pretrained model | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fix capitalization and change procedural heading to an imperative verb Obtain a Pretrained Model |
||
| ############################ | ||
| To start, we must first get an aligned model to generate responses from. There are 2 models we recommend to get started. The rest of the tutorial will work with either model, but for demonstration purposes we will use the smaller 2B model. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggested revision and fix grammar To get started, we need an aligned model for generating responses. We recommend two models: 2B GPT and LLaMa2 7B. While the tutorial works with either, we will use the smaller 2B model for demonstration purposes. |
||
|
|
||
| .. tab-set:: | ||
|
|
||
| .. tab-item:: 2B GPT | ||
| :sync: key1 | ||
|
|
||
| #. Get the 2B checkpoint via ``wget https://huggingface.co/nvidia/GPT-2B-001/resolve/main/GPT-2B-001_bf16_tp1.nemo`` | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a period #. Get the 2B checkpoint via |
||
| #. Extract the NeMo File to a folder with ``mkdir model_checkpoint && tar -xvf GPT-2B-001_bf16_tp1.nemo -C model_checkpoint`` | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a period, fix capitalization |
||
| #. And then run the script to convert from old NeMo checkpoint to Megatron-Core checkpoint. The script is located `here <https://github.com/NVIDIA/NeMo/blob/86b198ff93438d454f9c7f3550bcfb7d4e59feab/scripts/nlp_language_modeling/convert_nemo_gpt_to_mcore.py>`__. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. revise (remove And then) and add definite article "the" #. Run the script to convert from the old NeMo checkpoint to the Megatron-Core checkpoint. The script is located |
||
| .. code-block:: bash | ||
|
|
||
| python convert_nemo_gpt_to_mcore.py \ | ||
| --in-folder ./model_checkpoint \ | ||
| --out-file ./mcore_gpt.nemo | ||
|
|
||
| .. tab-item:: LLaMa2 7B | ||
| :sync: key2 | ||
|
|
||
| #. Download the `Llama 2 7B LLM model and tokenizer <https://huggingface.co/meta-llama/Llama-2-7b>`__ into the models folder. | ||
| #. Convert the LLaMa2 LLM into ``.nemo`` format | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a colon |
||
| .. code-block:: bash | ||
|
|
||
| python /opt/NeMo/scripts/checkpoint_converters/convert_llama_hf_to_nemo.py \ | ||
| --input_name_or_path /path/to/llama --output_path /output_path/mcore_gpt.nemo | ||
|
|
||
| After these steps you should have a file ``mcore_gpt.nemo`` to use in NeMo-Aligner. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a comma After these steps, you should have a file, |
||
|
|
||
| .. note:: | ||
| Mcore models use TransformerEngine as a backend, and it tries to find efficient kernels. But depending on the GPU you have it may not find them. If you ever face errors that relate to kernel finding set these variables on top of your script. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. revise Mcore models utilize TransformerEngine as a backend to find efficient kernels. However, depending on your GPU, it may not always succeed. If you encounter errors related to kernel finding, set these variables at the top of your script. |
||
|
|
||
| .. code-block:: bash | ||
|
|
||
| export NVTE_MASKED_SOFTMAX_FUSION=0 | ||
| export NVTE_FLASH_ATTN=0 | ||
| export NVTE_FUSED_ATTN=0 | ||
|
|
||
| Additionally, TransformerEngine is non-deterministic by default, meaning subsequent runs of generation using identical parameters will produce different results, which is not ideal for generation. | ||
| Helpfully, TransformerEngine exposes a flag to set if you want to guarantee deterministic generation runs: | ||
|
Comment on lines
+49
to
+50
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. revise Additionally, TransformerEngine is non-deterministic by default. This means that running the same generation with identical parameters multiple times will yield different results, which is not ideal for generation consistency. Fortunately, TransformerEngine provides a flag that you can set to ensure deterministic generation runs: |
||
|
|
||
| .. code-block:: bash | ||
|
|
||
| export NVTE_ALLOW_NONDETERMINISTIC_ALGO=0 | ||
| export NVTE_MASKED_SOFTMAX_FUSION=0 | ||
|
|
||
| Aligned vs Foundational (base) model for Generation | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fix capitalization and punctuation Aligned Model vs. Foundational (Base) Model for Generation |
||
| ################################################### | ||
| Generation can be run on either base/foundational models, that is, models which have only been trained on autoregressive language prediction tasks and not on instruction following tasks, | ||
| or, you can also run generation on models which have been aligned on instruction-based or preference-based datasets as well, similar to DPO/PPO. Either model will work, but you will get much higher quality | ||
| responses (generations) from an aligned model, and we highly recommend using an aligned model for generation if you want high quality responses. | ||
|
Comment on lines
+59
to
+61
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggested revision to break up long sentences and fix punctuation Generation can be executed on either base or foundational models. These are models that have been trained solely on autoregressive language prediction tasks and not on instruction-following tasks. Alternatively, you can run generation on models that have been aligned with instruction-based or preference-based datasets, similar to DPO/PPO. Both types of models are capable of performing generation. However, you will achieve significantly higher quality responses (generations) from an aligned model. Therefore, we highly recommend using an aligned model for generation if you want high-quality responses. |
||
|
|
||
| Data Format for Generation | ||
| ########################## | ||
|
|
||
| The input files for generation in Aligner use the exact same format of .jsonl files as used by SFT in Nemo and Aligner. Please see the data formatting section of SFT to understand the data format necessary for Self-Rewarding :ref:`SFT guide <sft>` | ||
| Please note that Aligner generation does not support the use of mmap or binary files, only .jsonl files in the SFT format. | ||
|
Comment on lines
+66
to
+67
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. revise and use full name/capitalization for NeMo-Aligner and NeMo The input files for generation in NeMo-Aligner use the same format of .jsonl files as those used by SFT in NeMo and NeMo-Aligner. Please see the data formatting section of SFT to understand the necessary data format for self-rewarding training :ref:SFT guide . Note that NeMo-Aligner generation does not support the use of mmap or binary files, only .jsonl files in SFT format. |
||
|
|
||
| Running Generation in Aligner | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. change procedural heading to imperative verb and add hyphen Run Generation in NeMo-Aligner |
||
| ############################# | ||
|
|
||
| Once your data is processed into the correct format you are ready to begin generation. You must start with a pretrained or aligned model. For this section we will use the aligned model from the previous section for generation. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. revise After processing your data into the correct format, you can begin generation. Start with a pretrained or aligned model. For this section, we'll use the aligned model from the previous section to generate the content. |
||
| For the purposes of the following sections, we'll assume your generation jsonl file is located in ``/path/to/generation_sft_format.jsonl``. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fix punctuation For the purposes of the following sections, we'll assume your generation .jsonl file is located in |
||
|
|
||
| The key parameters for generation are located under ``model.generation`` and include the following: | ||
|
|
||
| ``model.generation.num_responses_to_gen`` - controls how many responses you want the model to generate per prompt | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a period
|
||
|
|
||
| The following block shows the standard Nemo sampling params for generating responses, which are the same as we use across all Nemo and Aligner codebases: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. revise and use full name/capitalization for NeMo-Aligner and NeMo The following block shows the standard NeMo sampling params for generating responses, which are the same as we use across all NeMo and NeMo-Aligner codebases: |
||
|
|
||
| .. code-block:: yaml | ||
| sampling_params: | ||
| use_greedy: False | ||
| temperature: 1.0 | ||
| top_k: 0 | ||
| top_p: 1.0 | ||
| repetition_penalty: 1.0 | ||
| add_BOS: False | ||
| all_probs: False | ||
| compute_logprob: False | ||
| end_strings: ["<|endoftext|>", "<extra_id_1>"] | ||
|
|
||
| # length argument for autoregressive sampling | ||
| # max length means max amount of tokens to generate | ||
| length_params: | ||
| max_length: ${int_div:${model.encoder_seq_length}, 2} | ||
| min_length: 1 | ||
|
|
||
| Finally, we have the TRT parameters, which allows for faster TRTLLM-based response generation: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fix subject-verb agreement Finally, we have the TRT parameters, which allow for faster TRTLLM-based response generation: |
||
|
|
||
| .. code-block:: yaml | ||
| trt_llm: | ||
| enable: True # use this to turn TRT on/off | ||
| # reshard: False # reshard is not supported in generation | ||
|
|
||
| # TRTLLM preallocates activation memory according to the number of input tokens | ||
| max_input_len: ${subtract:${model.encoder_seq_length}, ${model.generation.length_params.max_length}} | ||
|
|
||
| model_type: gptnext # can be gptj, gptnext, llama, gemma, falcon | ||
|
|
||
| # Generation does not have a training stage, so there is no need to unload the engine. | ||
| unload_engine_train: False | ||
|
|
||
|
|
||
| Keep in mind that Aligner generation utilises data parallelism to speed up generation. This means that your input data file will be divided by GBS, and data which is | ||
| not cleanly divisible by GBS will be dropped starting from the end of the file. For example, if your data file has 11639 samples with a GBS of 32, this means that | ||
| 11639 mod 32 = 23 samples will be dropped and not generated. To avoid this, you can either reduce your data parallelism to 1, or you can pad your data file up to the nearest | ||
| multiple of your GBS (you can pad with basic prompts like "how are you"). Additionally, if you truncate your input data using the ``model.data.train_ds.max_seq_length`` parameter, | ||
| then your data will be reduced even further. Truncation applies before the DP truncation. | ||
|
Comment on lines
+115
to
+119
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggested revision .. note:: |
||
|
|
||
| With your data prepared, you can now run generation. We demonstrate two techniques below, one using cmdline inputs directly, and another demonstrating the use of SLURM. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. revise With your data prepared, you can now run the generation. We demonstrate two techniques below: one using command-line inputs directly, and another demonstrating the use of SLURM. |
||
|
|
||
|
|
||
| .. tab-set:: | ||
|
|
||
| .. tab-item:: Terminal | ||
| :sync: key3 | ||
|
|
||
| To run Self-Rewarding model training on the terminal directly: | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| export GPFS="/path/to/nemo-aligner-repo" | ||
| export TRAIN_DATA_PATH="/path/to/generation_sft_format.jsonl" | ||
|
|
||
| python -u ${GPFS}/examples/nlp/gpt/run_generation.py \ | ||
| trainer.num_nodes=1 \ | ||
| trainer.devices=8 \ | ||
| model.micro_batch_size=1 \ | ||
| model.global_batch_size=32 \ | ||
| pretrained_checkpoint.restore_from_path=/path/to/megatron_gpt_sft.nemo \ | ||
| "model.data.train_ds.file_path=${TRAIN_DATA_PATH}" \ | ||
| exp_manager.create_wandb_logger=false \ | ||
| exp_manager.wandb_logger_kwargs.project=null \ | ||
| exp_manager.wandb_logger_kwargs.name=null \ | ||
| exp_manager.explicit_log_dir=/results \ | ||
| ++model.sequence_parallel=false \ | ||
| ++model.apply_rope_fusion=false \ | ||
| trainer.generation.max_epochs=1 \ | ||
| model.generation.num_responses_to_gen=1 \ | ||
| trainer.generation.trt_llm.enable=true | ||
|
|
||
| .. tab-item:: Slurm | ||
| :sync: key4 | ||
|
|
||
| To run generation with Slurm, use the script below. The script uses 4 nodes, but you can change the node count to something different: | ||
|
|
||
| .. code-block:: bash | ||
|
|
||
| #!/bin/bash | ||
| #SBATCH -A <<ACCOUNT NAME>> | ||
| #SBATCH -p <<PARTITION NAME>> | ||
| #SBATCH -N 4 | ||
| #SBATCH -t 4:00:00 | ||
| #SBATCH -J <<JOB NAME>> | ||
| #SBATCH --ntasks-per-node=8 | ||
| #SBATCH --gpus-per-node 8 | ||
| #SBATCH --exclusive | ||
| #SBATCH --overcommit | ||
|
|
||
| GPFS="/path/to/nemo-aligner-repo" | ||
| PRETRAINED_CHECKPOINT_NEMO_FILE="/path/to/megatron_gpt_sft.nemo" | ||
|
|
||
| TRAIN_DATA_PATH="/path/to/generation_sft_format.jsonl" | ||
|
|
||
| PROJECT="<<WANDB PROJECT>>" | ||
|
|
||
| CONTAINER=<<<CONTAINER>>> # use the latest NeMo Training container, Aligner will work there | ||
| MOUNTS="--container-mounts=${GPFS}:${GPFS},${TRAIN_DATA_PATH}:${TRAIN_DATA_PATH},${PRETRAINED_CHECKPOINT_NEMO_FILE}:${PRETRAINED_CHECKPOINT_NEMO_FILE}" | ||
|
|
||
| RESULTS_DIR="/path/to/result_dir" | ||
|
|
||
| OUTFILE="${RESULTS_DIR}/rm-%j_%t.out" | ||
| ERRFILE="${RESULTS_DIR}/rm-%j_%t.err" | ||
| mkdir -p ${RESULTS_DIR} | ||
|
|
||
| read -r -d '' cmd <<EOF | ||
| echo "*******STARTING********" \ | ||
| && echo "---------------" \ | ||
| && echo "Starting generation" \ | ||
| && cd ${GPFS} \ | ||
| && export PYTHONPATH="${GPFS}:${PYTHONPATH}" \ | ||
| && export NVTE_ALLOW_NONDETERMINISTIC_ALGO=0 \ | ||
| && export NVTE_MASKED_SOFTMAX_FUSION=0 \ | ||
| && export HYDRA_FULL_ERROR=1 \ | ||
| && python -u ${GPFS}/examples/nlp/gpt/run_generation.py \ | ||
| trainer.num_nodes=${SLURM_JOB_NUM_NODES} \ | ||
| trainer.devices=8 \ | ||
| pretrained_checkpoint.restore_from_path='${PRETRAINED_CHECKPOINT_NEMO_FILE}' \ | ||
| "model.data.train_ds.file_path=${TRAIN_DATA_PATH}" \ | ||
| model.micro_batch_size=1 \ | ||
| model.global_batch_size=32 \ | ||
| ++model.sequence_parallel=false \ | ||
| ++model.apply_rope_fusion=false \ | ||
| exp_manager.explicit_log_dir=${RESULTS_DIR} \ | ||
| exp_manager.create_wandb_logger=False \ | ||
| exp_manager.wandb_logger_kwargs.name=null \ | ||
| exp_manager.wandb_logger_kwargs.project=null \ | ||
| trainer.generation.max_epochs=1 \ | ||
| model.generation.num_responses_to_gen=1 \ | ||
| trainer.generation.trt_llm.enable=true | ||
| EOF | ||
|
|
||
| srun -o $OUTFILE -e $ERRFILE --container-image=$CONTAINER $MOUNTS bash -c "${cmd}" | ||
| set +x | ||
|
|
||
| The output file containing the responses will be located in ``${RESULTS_DIR}/generations/generations.jsonl``. All responses will be stored to this file as they | ||
| are generated, and even if your generation process abruptly terminates, it will resume where it left off once restarted. Once generation is complete all of your | ||
| responses will be located in this file. | ||
|
|
||
|
Comment on lines
+217
to
+220
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggested revision The output file containing the responses will be located in |
||
| The structure of this file is a .jsonl file, where each line represents a JSON object of the following form: | ||
|
|
||
| .. code-block:: json | ||
| { step: the step number in the epoch, | ||
| consumed_samples: the number of samples consumed so far of the input dataset, | ||
| prompt: the prompt passed to the model, | ||
| responses: a list of length ``model.generation.num_responses_to_gen`` which contains all of the responses to the input prompt | ||
| } | ||
|
|
||
| The step and consumed_samples fields are not needed by the end user, but they're there so that the process can correctly resume if it unexpectedly goes down in the middle | ||
| of a generation run. | ||
|
Comment on lines
+230
to
+231
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggested revision You do not need the step and |
||
|
|
||
| Please note that the responses will contain all raw tokens which the model generated, this includes all special headers, turn starts/ends, and BOS/EOS tokens. To get a "clean" output | ||
| the end user must filter this out themselves via some sort of post-processing step (which is not currently provided). | ||
|
Comment on lines
+233
to
+234
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggested revision Please note that the responses will include all raw tokens generated by the model. This includes special headers, turn starts/ends, and BOS/EOS tokens. To obtain a "clean" output, you will need to filter these out through a post-processing step, which is not currently provided. |
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revise heading for SEO
Model Generation with Data Parallelism and TensorRT (TRT)