Commit c134433
authored
[Benchmark Metrics] Add reporting in benchmark for (1) TTST(time-to-second-token) (#159)
(2) req complete rate time series (3) output token throughput rate time series
Major changes
* TTST tracked and reported in benchmarking output
* request complete rate and output token throughput rate over time
calculated and reported in csv string format for graph plotting
Minor changes
* typo fix in comment
* type annotation added in a few places
* python style fix to comply with Google style
* docstring added for a couple of method/function
* misc code cleanup
Reasons:
* TTST: Useful to track the response latency including kv cache transfer latency.
* Time series: Useful to plot a graph showing different stages of benchmarking
(e.g. low output token throughput while generate batch still gets filled up,
and high output token throughput when the generate batch is full)
Sample view of additional output in benchmarking result:
...
Mean ttft: 56.70 ms
Median ttft: 50.04 ms
P99 ttft: 117.60 ms
Mean ttst: 702.75 ms
Median ttst: 759.35 ms
P99 ttst: 813.11 ms
...
----- Request complete rate time series (window_size = 10 sec) -----
TimeStamp,2025-01-07 07:26:30,2025-01-07 07:26:40,2025-01-07 07:26:50
Value,0.60,0.30,0.25
----- Output token rate time series (window_size = 10 sec) -----
TimeStamp,2025-01-07 07:26:18,2025-01-07 07:26:28,2025-01-07 07:26:38,2025-01-07 07:26:48
Value,43.80,115.70,66.80,22.001 parent 92fa048 commit c134433
File tree
6 files changed
+589
-105
lines changed- benchmarks
- tests
- jetstream
- core
- tests/core
6 files changed
+589
-105
lines changed
0 commit comments