Skip to content

Only collect async ID when it's safe to do so#197

Merged
szegedi merged 4 commits intomainfrom
szegedi/async-id-2
Mar 19, 2025
Merged

Only collect async ID when it's safe to do so#197
szegedi merged 4 commits intomainfrom
szegedi/async-id-2

Conversation

@szegedi
Copy link
Copy Markdown

@szegedi szegedi commented Feb 17, 2025

What does this PR do?:
Reenables async ID collection with guards. Prevents collecting async ID when either GC is ongoing (captures the ID at the GC prologue) or when the isolate has no current context.

Motivation:
We want to safely collect async ID.

Additional Notes:
Jira: PROF-11265

@szegedi szegedi changed the title Guard against collecting async ID while performing GC Only collect async ID when it's safe to do so Feb 17, 2025
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 17, 2025

Overall package size

Self size: 9.57 MB
Deduped: 9.94 MB
No deduping: 9.94 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | source-map | 0.7.4 | 226 kB | 226 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | p-limit | 3.1.0 | 7.75 kB | 13.78 kB | | delay | 5.0.0 | 11.17 kB | 11.17 kB | | node-gyp-build | 3.9.0 | 8.81 kB | 8.81 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Feb 17, 2025

Benchmarks

Benchmark execution time: 2025-03-06 17:10:57

Comparing candidate commit 79992a0 in PR branch szegedi/async-id-2 with baseline commit 51951b7 in branch main.

Found 3 performance improvements and 0 performance regressions! Performance is the same for 90 metrics, 27 unstable metrics.

scenario:profiler-idle-with-wall-profiler-20

  • 🟩 cpu_user_time [-7.193ms; -1.554ms] or [-11.979%; -2.587%]

scenario:profiler-light-load-no-wall-profiler-16

  • 🟩 cpu_user_time [-7.421ms; -2.057ms] or [-7.705%; -2.136%]

scenario:profiler-light-load-no-wall-profiler-18

  • 🟩 cpu_user_time [-8.856ms; -2.354ms] or [-9.567%; -2.543%]

@szegedi szegedi marked this pull request as ready for review February 26, 2025 13:00
@szegedi szegedi requested a review from nsavoire as a code owner February 26, 2025 13:00
@szegedi szegedi changed the base branch from main to szegedi/keep-idle February 26, 2025 13:00
@szegedi szegedi added the semver-patch Bug or security fixes, mainly label Feb 26, 2025
return;
}
auto isolate = Isolate::GetCurrent();
if (!isolate || isolate->IsDead()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which cases does GetCurrent returns null ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't know – I'm just trying to be pedantically defensive and not unconditionally dereference a pointer. I can envision a race condition between signal delivery and an isolate going away? I know the logic is different, but even the V8 sampler manager checks for isolate on a sampler not being null.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be more lenient and still invoke old_handler?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok as this.

Base automatically changed from szegedi/keep-idle to main February 27, 2025 10:12
- don't sample a dead isolate
- only try to retrieve an async ID when there's a context
Use relaxed load/store with appropriate signal fences for cheaper reading/writing of gcCount.
@szegedi szegedi force-pushed the szegedi/async-id-2 branch from 5be2a03 to a694680 Compare March 3, 2025 14:44
@szegedi szegedi requested a review from nsavoire March 4, 2025 10:19
Comment thread ts/src/time-profiler.ts
withContexts?: boolean;
workaroundV8Bug?: boolean;
collectCpuTime?: boolean;
collectAsyncId?: boolean;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Quality Violation

Prop name (collectAsyncId) doesn't match rule (is|has) (...read more)

Enforces a consistent naming pattern for boolean props.

The pattern is: "^(is|has)[A-Z]([A-Za-z0-9]?)+" to enforce is and has prefixes.

View in Datadog  Leave us feedback  Documentation

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's new. I'm inclined to ignore it, seeing how existing boolean properties also don't follow this format.

@szegedi szegedi force-pushed the szegedi/async-id-2 branch 3 times, most recently from 3c86526 to 79992a0 Compare March 6, 2025 17:05
}
}

#define DD_WALL_PROFILER_GET_BOOLEAN_CONFIG(name) \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No a big fan of macros, but here this nicely simplifies the code.

@nsavoire
Copy link
Copy Markdown

nsavoire commented Mar 7, 2025

Overall this looks good !
Are you confident that GC callbacks + null/dead/incontext isolate checks are enough to avoid crashes ?

@szegedi
Copy link
Copy Markdown
Author

szegedi commented Mar 7, 2025

Are you confident that GC callbacks + null/dead/incontext isolate checks are enough to avoid crashes ?

I'd say I'm reasonably certain. In any case, I don't have an idea for other kinds of safeguards right now. All crashtracker reports (about 165) we have for the past month from people stuck on the unsafe version of the profiler were in GC, as well as 5 reports where the crashtracker failed to unwind so we can't tell.

@szegedi szegedi requested a review from nsavoire March 12, 2025 15:08
@szegedi szegedi merged commit d213a01 into main Mar 19, 2025
@szegedi szegedi deleted the szegedi/async-id-2 branch March 19, 2025 15:27
szegedi added a commit that referenced this pull request Mar 19, 2025
* Guard against collecting async ID while performing GC

* Additional sanity checks:
- don't sample a dead isolate
- only try to retrieve an async ID when there's a context

* Move GetAsyncId and GC callbacks out of the header file.
Use relaxed load/store with appropriate signal fences for cheaper reading/writing of gcCount.

* Have a config option for collecting async IDs
@szegedi szegedi mentioned this pull request Mar 19, 2025
szegedi added a commit that referenced this pull request Mar 26, 2025
* Guard against collecting async ID while performing GC

* Additional sanity checks:
- don't sample a dead isolate
- only try to retrieve an async ID when there's a context

* Move GetAsyncId and GC callbacks out of the header file.
Use relaxed load/store with appropriate signal fences for cheaper reading/writing of gcCount.

* Have a config option for collecting async IDs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

semver-patch Bug or security fixes, mainly

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants