Commit c134433

authored

[Benchmark Metrics] Add reporting in benchmark for (1) TTST(time-to-second-token) (#159)

(2) req complete rate time series (3) output token throughput rate time series Major changes * TTST tracked and reported in benchmarking output * request complete rate and output token throughput rate over time calculated and reported in csv string format for graph plotting Minor changes * typo fix in comment * type annotation added in a few places * python style fix to comply with Google style * docstring added for a couple of method/function * misc code cleanup Reasons: * TTST: Useful to track the response latency including kv cache transfer latency. * Time series: Useful to plot a graph showing different stages of benchmarking (e.g. low output token throughput while generate batch still gets filled up, and high output token throughput when the generate batch is full) Sample view of additional output in benchmarking result: ... Mean ttft: 56.70 ms Median ttft: 50.04 ms P99 ttft: 117.60 ms Mean ttst: 702.75 ms Median ttst: 759.35 ms P99 ttst: 813.11 ms ... ----- Request complete rate time series (window_size = 10 sec) ----- TimeStamp,2025-01-07 07:26:30,2025-01-07 07:26:40,2025-01-07 07:26:50 Value,0.60,0.30,0.25 ----- Output token rate time series (window_size = 10 sec) ----- TimeStamp,2025-01-07 07:26:18,2025-01-07 07:26:28,2025-01-07 07:26:38,2025-01-07 07:26:48 Value,43.80,115.70,66.80,22.00

1 parent 92fa048 commit c134433Copy full SHA for c134433

6 files changed

+589

-105

lines changed

benchmarks
jetstream
- core
  - orchestrator.py
- tests/core
  - test_orchestrator.py

6 files changed

+589

-105

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit c134433

6 files changed

6 files changed

File tree

6 files changed

6 files changed

0 commit comments