fix(profiling): fix SystemError when collecting memory profiler events [forwardport 3.x]#12140
Merged
erikayasuda merged 2 commits into3.x-stagingfrom Jan 30, 2025
Merged
fix(profiling): fix SystemError when collecting memory profiler events [forwardport 3.x]#12140erikayasuda merged 2 commits into3.x-stagingfrom
erikayasuda merged 2 commits into3.x-stagingfrom
Conversation
#12075) We added locking to the memory profiler to address crashes. These locks are mostly "try" locks, meaning we bail out if we can't acquire them right away. This was done defensively to mitigate the possibility of deadlock until we fully understood why the locks are needed and could guarantee their correctness. But as a result of using try locks, the `iter_events` function in particular can fail if the memory profiler lock is contended when it tries to collect profiling events. The function then returns NULL, leading to SystemError exceptions because we don't set an error. Even if we set an error, returning NULL isn't the right thing to do. It'll basically mean we wait until the next profile iteration, still accumulating events in the same buffer, and try again to upload the events. So we're going to get multiple iteration's worth of events. The right thing to do is take the lock unconditionally in `iter_events`. We can allocate the new tracker outside the memory allocation profiler lock so that we don't need to worry about reentrancy/deadlock issues if we start profiling that allocation. Then, the only thing we do under the lock is swap out the global tracker, so it's safe to take the lock unconditionally. Fixes #11831 TODO - regression test? (cherry picked from commit e3045a1)
Contributor
|
|
nsrip-dd
approved these changes
Jan 29, 2025
Datadog ReportBranch report: ✅ 0 Failed, 130 Passed, 1468 Skipped, 5m 26.23s Total duration (35m 44.29s time saved) |
BenchmarksBenchmark execution time: 2025-01-29 17:37:27 Comparing candidate commit 579c406 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 394 metrics, 2 unstable metrics. |
emmettbutler
approved these changes
Jan 29, 2025
gnufede
approved these changes
Jan 30, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Forwardporting #12075
We added locking to the memory profiler to address crashes. These locks
are mostly "try" locks, meaning we bail out if we can't acquire them
right away. This was done defensively to mitigate the possibility of
deadlock until we fully understood why the locks are needed and could
guarantee their correctness. But as a result of using try locks, the
iter_events function in particular can fail if the memory profiler lock
is contended when it tries to collect profiling events. The function
then returns NULL, leading to SystemError exceptions because we don't
set an error.
Even if we set an error, returning NULL isn't the right thing to do.
It'll basically mean we wait until the next profile iteration, still
accumulating events in the same buffer, and try again to upload the
events. So we're going to get multiple iteration's worth of events. The
right thing to do is take the lock unconditionally in iter_events. We
can allocate the new tracker outside the memory allocation profiler lock
so that we don't need to worry about reentrancy/deadlock issues if
we start profiling that allocation. Then, the only thing we do under the
lock is swap out the global tracker, so it's safe to take the lock
unconditionally.
Fixes #11831
Checklist
Reviewer Checklist