Skip to content

Conversation

@RafaelGSS
Copy link
Owner

@RafaelGSS RafaelGSS commented Dec 2, 2025

  • Add ttest option to Suite that automatically sets repeatSuite=30
  • Implement Welch's t-test for comparing benchmark results
  • Display significance stars (*, **, ***) based on p-values

The t-test compares 30 independent runs of each benchmark to determine if performance differences are statistically significant, helping identify real improvements vs. random variance.

image

text += styleText(["blue", "bold"], `${result.timeFormatted} total time`);
}

// TODO: produce confidence on stddev
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this todo be cleared?

@@ -0,0 +1,218 @@
// Welch's t-test implementation for benchmark comparison
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might warrant a bibliography entry or three. What reference materials did you use?

Copy link
Owner Author

@RafaelGSS RafaelGSS Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about including some references, but that's actually well spread out there. I think it's just a matter of context.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, my challenge to you is to keep in the back of your head if you ever see a particularly good article or video to come back here and add a link.

Or maybe just the Wikipedia link if nothing else fits the bill.

- Add ttest option to Suite that automatically sets repeatSuite=30
- Implement Welch's t-test for comparing benchmark results
- Display significance stars (*, **, ***) based on p-values
- Add T-Test Mode indicator in reporter output
- Update TypeScript definitions with ttest and ReporterOptions
- Add comprehensive tests for t-test utilities
- Add statistical-significance example demonstrating the feature
- Update documentation with usage and interpretation guide

The t-test compares 30 independent runs of each benchmark to determine
if performance differences are statistically significant, helping identify
real improvements vs. random variance.

Signed-off-by: RafaelGSS <rafael.nunu@hotmail.com>
@RafaelGSS
Copy link
Owner Author

PTAL @jdmarshall

@RafaelGSS
Copy link
Owner Author

PTAL @H4ad @jdmarshall - Planning to land it today.

Copy link
Collaborator

@H4ad H4ad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same implementation of the one we have for node? If so, great to have it here.

@RafaelGSS
Copy link
Owner Author

It's the same implementation of the one we have for node? If so, great to have it here.

Pretty much it. But instead of comparing binaries, we compare benchmark.fn.

@RafaelGSS RafaelGSS merged commit 53e20aa into main Dec 17, 2025
6 checks passed
@RafaelGSS RafaelGSS deleted the add-ttest-feature branch December 17, 2025 16:25
@jdmarshall
Copy link
Collaborator

Oops. LGTM

@jdmarshall
Copy link
Collaborator

Holy cow is this a lot slower. Should I be adjusting the run count?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants