Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit 7ce7e1d

Browse files
authored
Add Latest News section (NVIDIA#366)
1 parent 1f3a421 commit 7ce7e1d

File tree

4 files changed

+1
-2
lines changed

4 files changed

+1
-2
lines changed

docs/source/blogs/H200launch.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,7 @@ For practical examples of H200's performance:
3333
**Max Throughput TP8:**
3434
an online chat agent scenario (ISL/OSL=80/200) with GPT3-175B on a full HGX (TP8) H200 is 1.6x more performant than H100.
3535

36-
<img src="media/H200launch_Llama70B_tps.png" alt="max throughput llama TP1" width="250" height="auto">
37-
<img src="media/H200launch_GPT175B_tps.png" alt="max throughput GPT TP8" width="250" height="auto">
36+
<img src="media/H200launch_tps.png" alt="max throughput llama TP1" width="500" height="auto">
3837

3938
<sub>Preliminary measured performance, subject to change.
4039
TensorRT-LLM v0.5.0, TensorRT v9.1.0.4. | Llama-70B: H100 FP8 BS 8, H200 FP8 BS 32 | GPT3-175B: H100 FP8 BS 64, H200 FP8 BS 128 </sub>
-13.8 KB
Binary file not shown.
-13.5 KB
Binary file not shown.
22.5 KB
Loading

0 commit comments

Comments
 (0)