-
Notifications
You must be signed in to change notification settings - Fork 166
perf(prof): gather time samples during allocation samples #3559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
When allocation profiler takes a sample and interrupt_count > 0, collect both allocation and time samples in a single stack walk. Benefits: - Eliminates redundant stack walks when samples coincide. - Improves time sample accuracy (closer to timer expiry).
Let's do this in a dedicated PR.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3559 +/- ##
==========================================
- Coverage 61.90% 61.82% -0.08%
==========================================
Files 140 140
Lines 13281 13281
Branches 1758 1758
==========================================
- Hits 8221 8211 -10
- Misses 4272 4279 +7
- Partials 788 791 +3 see 3 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Benchmarks [ profiler ]Benchmark execution time: 2026-01-14 05:26:20 Comparing candidate commit a3e3e4e in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 29 metrics, 7 unstable metrics. |
|
|
||
| function allocate_large() { | ||
| // Large allocation to trigger sampling | ||
| $data = str_repeat("a", 1024 * 200_000); // ~200MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str_repeat() is not a frameless function, substr() or str_replace() are frameless function for example and do allocate memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
realFlowControl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created an upstream PR to fix the empty/malformed expected profile JSON problem that this PR showed. Can you fix the tests, especially the allocation_time_combined.json file?
| $s = str_repeat("x", 10_000_000); | ||
| strlen($s); // Use the string to prevent optimization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still uses str_repeat() which is not a frameless function, strlen() is, but it does not allocate and as such will not trigger a combined sample.
Can we add a text, maybe some Lorem Ipsum (or more nerdy https://future-lives.com/wp-content/uploads/2014/09/TheSentinel.pdf) and replace random words in that text? I'd think using an array as the needle to replace should take time and allocate enough to trigger sampling. Setting DD_PROFILING_ALLOCATION_SAMPLING_DISTANCE=1 should also help taking more allocation samples and increase the probability of triggering a combined sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire JSON has nothing to do with how the JSON's look that the https://github.com/DataDog/prof-correctness action accepts. As such prof-correctness seems to not match anything against the pprof which means it does not fail and as such succeeds (which is something we should fix in the upstream action, it should fail if the JSON is in an invalid structure).
Have a look at the other examples, like https://github.com/DataDog/dd-trace-php/blob/4a3a98716843640b4a8d6894da8ef0072500c729/profiling/tests/correctness/timeline.json how the JSON is supposed to look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've sort of fixed it. How can I make this only run for PHP 8.4+? Because the % will be determined by whether it's frameless or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can I make this only run for PHP 8.4+?
Not in this file, but in the workflow run itself (the .github/workflows/prof_correctness.yml), have a look how it is done for ZTS tests. You should be able to apply the same pattern for PHP 8.4+
But this brings me to a wider topic (one we should probably move to another discussion): str_repeat() as a frameless function should have less CPU time on PHP 8.4+ because of it being a frameless function, so depending on how big this effect is, we could "burry" this (but document it) in the error_margin or create two JSON files, one to match for PHP < 8.4 and one >= 8.4 with different percent values.
Also: we should lower the error_margin from 30.
| export DD_PROFILING_ENDPOINT_COLLECTION_ENABLED=no | ||
| export EXECUTION_TIME=5 | ||
| php -d extension=$PWD/target/profiler-release/libdatadog_php_profiling.so profiling/tests/correctness/allocation_time_combined.php | ||
| unset DD_PROFILING_ALLOCATION_ENABLED DD_PROFILING_WALL_TIME_ENABLED DD_PROFILING_EXPERIMENTAL_CPU_TIME_ENABLED DD_PROFILING_ENDPOINT_COLLECTION_ENABLED EXECUTION_TIME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| unset DD_PROFILING_ALLOCATION_ENABLED DD_PROFILING_WALL_TIME_ENABLED DD_PROFILING_EXPERIMENTAL_CPU_TIME_ENABLED DD_PROFILING_ENDPOINT_COLLECTION_ENABLED EXECUTION_TIME | |
| unset DD_PROFILING_ALLOCATION_ENABLED DD_PROFILING_WALL_TIME_ENABLED DD_PROFILING_EXPERIMENTAL_CPU_TIME_ENABLED DD_PROFILING_ENDPOINT_COLLECTION_ENABLED EXECUTION_TIME DD_PROFILING_ALLOCATION_SAMPLING_DISTANCE |
Just if you accept the change above
This got merged and I re-run the correctness tests, which are now failing. |
| { | ||
| "regular_expression": "<?php;main;standard\\|str_replace$", | ||
| "percent": 1.0, | ||
| "error_margin": 0.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An error margin of 0.9 is interestingly specific ... and from the prof correctness test ...
pprof_analysis.go:667: Error opening file /github/workspace/profiling/tests/correctness/allocation_time_combined.json: JSON schema validation failed for /github/workspace/profiling/tests/correctness/allocation_time_combined.json:
- stacks.2.stack-content.1.error_margin: Invalid type. Expected: integer, given: number
... not allowed, this needs to be an integer, not a float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because I want a non-zero value, and the true rate is around 0.5-1.0%. Setting a percent of 1 with margin of 1 would allow for 0, which is NOT good.
| { | ||
| "regular_expression": "<?php;main$", | ||
| "percent": 80, | ||
| "error_margin": 30 | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we are not interested in the main function itself.
| { | |
| "regular_expression": "<?php;main$", | |
| "percent": 80, | |
| "error_margin": 30 | |
| }, |
I think we could add the str_repeat() function though.
| { | ||
| "regular_expression": "<?php;main;standard\\|str_replace$", | ||
| "percent": 40, | ||
| "error_margin": 30 | ||
| }, | ||
| { | ||
| "regular_expression": "<?php;main;standard\\|str_repeat$", | ||
| "percent": 40, | ||
| "error_margin": 30 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you choose those percent values?
Just asking, because I guess that str_replace() and str_repeat() are the only (let's say at least 99%) functions that do allocations in that test script, so shouldn't it be safe to do:
| { | |
| "regular_expression": "<?php;main;standard\\|str_replace$", | |
| "percent": 40, | |
| "error_margin": 30 | |
| }, | |
| { | |
| "regular_expression": "<?php;main;standard\\|str_repeat$", | |
| "percent": 40, | |
| "error_margin": 30 | |
| } | |
| { | |
| "regular_expression": "<?php;main;standard\\|str_replace$", | |
| "percent": 50, | |
| "error_margin": 1 | |
| }, | |
| { | |
| "regular_expression": "<?php;main;standard\\|str_repeat$", | |
| "percent": 50, | |
| "error_margin": 1 | |
| } |
The same might be true for alloc-samples bellow.
| }, | ||
| { | ||
| "regular_expression": "<?php;main;standard\\|str_replace$", | ||
| "percent": 1.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an integer, not a float. I suspect it will fail once the error_margin line below is fixed. This is just not failing in the JSON validation, as the percent value (and others) are validated in Go code after the JSON schema validation succeeded.
The value of 1 might be too little as well.
|
03f3c8e to
f52cf55
Compare
978ef91 to
fc62464
Compare
fc62464 to
ee910ea
Compare
Description
Gathers time samples during allocation samples if
interrupt_count > 0. This has two benefits:When I say a slight performance benefit, here's a comparison to v1.15.2:
The accuracy benefit comes from the fact that when an interrupt is set, on master it won't be handled until the next VM interrupt. If we're gathering a memory sample and that count is non-zero, that means that we're gathering it sooner than it would have otherwise been handled, so less VM code can change. Notably, this should help with frameless functions (added in PHP 8.4) which allocate.
Reviewer checklist