Skip to content

Conversation

@mika-fischer
Copy link
Contributor

Other threads can still hold a valid handle to the tsfn after
finalization if finalization was triggered by

  • release with napi_tsfn_abort, or
  • environment shutdown

Handle this by:

  • protecting finalization itself with the mutex
  • if necessary, delay deletion after finalization to when thread_count
    drops to 0
  • releasing all resources as soon as possible before deletion

Fixes: #55706

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/node-api

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. node-api Issues and PRs related to the Node-API. labels Nov 16, 2024
@mika-fischer
Copy link
Contributor Author

I still need to see how to integrate the test from here https://github.com/mika-fischer/node-bug-napi-tsfn

@mhdawson
Copy link
Member

@gabrielschulhof you mentioned you'd take a look at this issue in our weekly meeting last week. If you can take a look at this PR it would be great.

@gabrielschulhof
Copy link
Contributor

LGTM. @mika-fischer, please add the test when you get a chance!

@gabrielschulhof gabrielschulhof changed the title node-api: fix data race and use-after-free in napi_threadsafe_function [WIP] node-api: fix data race and use-after-free in napi_threadsafe_function Jan 31, 2025
@gabrielschulhof gabrielschulhof marked this pull request as ready for review January 31, 2025 17:02
@gabrielschulhof
Copy link
Contributor

Marking as ready for review so the tests will run.

@mika-fischer
Copy link
Contributor Author

Marking as ready for review so the tests will run.

@gabrielschulhof Thanks for pushing this further, and sorry for not following up!

FYI The test case only triggers reliably under valgrind. Otherwise it crashes only occasionally. I don't know if anthing special is needed or if all tests are run with valgrind anyway.

@KevinEady
Copy link
Contributor

Hello @mika-fischer ,

Are you still interesting in managing this pull request?

@mika-fischer
Copy link
Contributor Author

@KevinEady Yes, sure, I'm just not sure what was missing the last time around. I think after @gabrielschulhof added the test this PR could have been merged already, no?

I can rebase the PR, but other than that I'd need guidance what else needs to be done.

@mika-fischer mika-fischer force-pushed the fix-55706 branch 2 times, most recently from f21787d to 090268b Compare August 27, 2025 17:48
@mika-fischer mika-fischer changed the title [WIP] node-api: fix data race and use-after-free in napi_threadsafe_function node-api: fix data race and use-after-free in napi_threadsafe_function Aug 27, 2025
@mika-fischer
Copy link
Contributor Author

@KevinEady I rebased and fixed the formatting and linting issues with the unit test. Let me know if there's more to do in order to merge this.

@codecov
Copy link

codecov bot commented Aug 27, 2025

Codecov Report

❌ Patch coverage is 86.95652% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.54%. Comparing base (fc203b3) to head (fd90228).
⚠️ Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
src/node_api.cc 86.95% 3 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #55877      +/-   ##
==========================================
- Coverage   88.55%   88.54%   -0.01%     
==========================================
  Files         703      703              
  Lines      208259   208284      +25     
  Branches    40178    40180       +2     
==========================================
+ Hits       184425   184430       +5     
- Misses      15855    15864       +9     
- Partials     7979     7990      +11     
Files with missing lines Coverage Δ
src/node_api.cc 75.15% <86.95%> (-0.11%) ⬇️

... and 34 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@KevinEady
Copy link
Contributor

Thanks @mika-fischer , we'll discuss this in the next Node-API meeting.

@mika-fischer
Copy link
Contributor Author

Anything else I can do to move this along?

@mika-fischer
Copy link
Contributor Author

@KevinEady Just another ping, before this PR goes stale again...

@KevinEady
Copy link
Contributor

Hi @mika-fischer ,

Thanks for waiting. Took awhile to for me to get around to writing some small tests that ensure that the tsfn's CallJS was called on remaining queued items on abort, and that looks good 👍 Got a little side-tracked on debugging some (what I think is) odd behavior that is not related to this PR, so I'll probably make a separate issue.

I think this PR is ready-to-go, @legendecas @vmoroz ?

@mika-fischer
Copy link
Contributor Author

Thank you @KevinEady! Not sure what to make of the latest test failures. They seem to be unrelated...

@nodejs-github-bot
Copy link
Collaborator

gabrielschulhof and others added 4 commits November 25, 2025 08:35
Other threads can still hold a valid handle to the tsfn after
finalization if finalization was triggered by
- release with napi_tsfn_abort, or
- environment shutdown

Handle this by:
- protecting finalization itself with the mutex
- if necessary, delay deletion after finalization to when thread_count
  drops to 0
- releasing all resources as soon as possible before deletion

Fixes: nodejs#55706
@mika-fischer
Copy link
Contributor Author

@KevinEady @mhdawson @gabrielschulhof I once again rebased and fixed the new lint error. Please move this forward or let me know what I still need to do.

@nodejs-github-bot
Copy link
Collaborator

@legendecas legendecas added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Nov 26, 2025
@legendecas legendecas added the commit-queue Add this label to land a pull request using GitHub Actions. label Nov 27, 2025
@nodejs-github-bot nodejs-github-bot added commit-queue-failed An error occurred while landing this pull request using GitHub Actions. and removed commit-queue Add this label to land a pull request using GitHub Actions. labels Nov 27, 2025
@nodejs-github-bot
Copy link
Collaborator

Commit Queue failed
- Loading data for nodejs/node/pull/55877
✔  Done loading data for nodejs/node/pull/55877
----------------------------------- PR info ------------------------------------
Title      node-api: fix data race and use-after-free in napi_threadsafe_function  (#55877)
Author     Mika Fischer <mika.fischer@zoopnet.de> (@mika-fischer)
Branch     mika-fischer:fix-55706 -> nodejs:main
Labels     c++, node-api, author ready, needs-ci
Commits    5
 - node-api: add unit test
 - node-api: fix data race and use-after-free in napi_threadsafe_function
 - node-api: combine threadsafe_function state flags into single variable
 - node-api: release lock before calling user callback
 - Fix test lint
Committers 1
 - Mika Fischer <mika.fischer@zoopnet.de>
PR-URL: https://github.com/nodejs/node/pull/55877
Fixes: https://github.com/nodejs/node/issues/55706
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
------------------------------ Generated metadata ------------------------------
PR-URL: https://github.com/nodejs/node/pull/55877
Fixes: https://github.com/nodejs/node/issues/55706
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
--------------------------------------------------------------------------------
   ℹ  This PR was created on Sat, 16 Nov 2024 12:16:20 GMT
   ✔  Approvals: 1
   ✔  - Chengzhong Wu (@legendecas) (TSC): https://github.com/nodejs/node/pull/55877#pullrequestreview-3514457543
   ✔  Last GitHub CI successful
   ℹ  Last Full PR CI on 2025-11-26T21:17:23Z: https://ci.nodejs.org/job/node-test-pull-request/70325/
- Querying data for job/node-test-pull-request/70325/
   ✔  Last Jenkins CI successful
--------------------------------------------------------------------------------
   ✔  No git cherry-pick in progress
   ✔  No git am in progress
   ✔  No git rebase in progress
--------------------------------------------------------------------------------
- Bringing origin/main up to date...
From https://github.com/nodejs/node
 * branch                  main       -> FETCH_HEAD
✔  origin/main is now up-to-date
- Downloading patch for 55877
From https://github.com/nodejs/node
 * branch                  refs/pull/55877/merge -> FETCH_HEAD
✔  Fetched commits as e825de8e02c2..fd9022818808
--------------------------------------------------------------------------------
[main 9757caffcc] node-api: add unit test
 Author: Gabriel Schulhof <gabrielschulhof@gmail.com>
 Date: Fri Jan 31 08:53:04 2025 -0800
 3 files changed, 111 insertions(+)
 create mode 100644 test/node-api/test_threadsafe_function_shutdown/binding.cc
 create mode 100644 test/node-api/test_threadsafe_function_shutdown/binding.gyp
 create mode 100644 test/node-api/test_threadsafe_function_shutdown/test.js
[main ada8bdf49a] node-api: fix data race and use-after-free in napi_threadsafe_function
 Author: Mika Fischer <mika.fischer@zoopnet.de>
 Date: Fri Nov 15 18:58:09 2024 +0100
 1 file changed, 84 insertions(+), 40 deletions(-)
[main 11b13a8f75] node-api: combine threadsafe_function state flags into single variable
 Author: Mika Fischer <mika.fischer@zoopnet.de>
 Date: Sat Nov 16 11:33:03 2024 +0100
 1 file changed, 25 insertions(+), 23 deletions(-)
[main 189ed03aba] node-api: release lock before calling user callback
 Author: Mika Fischer <mika.fischer@zoopnet.de>
 Date: Sat Aug 30 13:03:31 2025 +0200
 1 file changed, 8 insertions(+), 4 deletions(-)
[main 4d921bd23f] Fix test lint
 Author: Mika Fischer <mika.fischer@zoopnet.de>
 Date: Tue Nov 25 09:30:34 2025 +0100
 1 file changed, 2 insertions(+), 2 deletions(-)
   ✔  Patches applied
There are 5 commits in the PR. Attempting autorebase.
(node:2217) [DEP0190] DeprecationWarning: Passing args to a child process with shell option true can lead to security vulnerabilities, as the arguments are not escaped, only concatenated.
(Use `node --trace-deprecation ...` to show where the warning was created)
Rebasing (2/10)
Executing: git node land --amend --yes
--------------------------------- New Message ----------------------------------
node-api: add unit test

PR-URL: #55877
Fixes: #55706
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>

[detached HEAD e3152cd546] node-api: add unit test
Author: Gabriel Schulhof <gabrielschulhof@gmail.com>
Date: Fri Jan 31 08:53:04 2025 -0800
3 files changed, 111 insertions(+)
create mode 100644 test/node-api/test_threadsafe_function_shutdown/binding.cc
create mode 100644 test/node-api/test_threadsafe_function_shutdown/binding.gyp
create mode 100644 test/node-api/test_threadsafe_function_shutdown/test.js
Rebasing (3/10)
Rebasing (4/10)
Executing: git node land --amend --yes
⚠ Found Fixes: #55706, skipping..
--------------------------------- New Message ----------------------------------
node-api: fix data race and use-after-free in napi_threadsafe_function

Other threads can still hold a valid handle to the tsfn after
finalization if finalization was triggered by

  • release with napi_tsfn_abort, or
  • environment shutdown

Handle this by:

  • protecting finalization itself with the mutex
  • if necessary, delay deletion after finalization to when thread_count
    drops to 0
  • releasing all resources as soon as possible before deletion

Fixes: #55706
PR-URL: #55877
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>

[detached HEAD 65763b706f] node-api: fix data race and use-after-free in napi_threadsafe_function
Author: Mika Fischer <mika.fischer@zoopnet.de>
Date: Fri Nov 15 18:58:09 2024 +0100
1 file changed, 84 insertions(+), 40 deletions(-)
Rebasing (5/10)
Rebasing (6/10)
Executing: git node land --amend --yes
--------------------------------- New Message ----------------------------------
node-api: combine threadsafe_function state flags into single variable

PR-URL: #55877
Fixes: #55706
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>

[detached HEAD 0e4fe7e07e] node-api: combine threadsafe_function state flags into single variable
Author: Mika Fischer <mika.fischer@zoopnet.de>
Date: Sat Nov 16 11:33:03 2024 +0100
1 file changed, 25 insertions(+), 23 deletions(-)
Rebasing (7/10)
Rebasing (8/10)
Executing: git node land --amend --yes
--------------------------------- New Message ----------------------------------
node-api: release lock before calling user callback

PR-URL: #55877
Fixes: #55706
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>

[detached HEAD 250f1f9c89] node-api: release lock before calling user callback
Author: Mika Fischer <mika.fischer@zoopnet.de>
Date: Sat Aug 30 13:03:31 2025 +0200
1 file changed, 8 insertions(+), 4 deletions(-)
Rebasing (9/10)
Rebasing (10/10)
Executing: git node land --amend --yes
--------------------------------- New Message ----------------------------------
Fix test lint

PR-URL: #55877
Fixes: #55706
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>

[detached HEAD e400403f1a] Fix test lint
Author: Mika Fischer <mika.fischer@zoopnet.de>
Date: Tue Nov 25 09:30:34 2025 +0100
1 file changed, 2 insertions(+), 2 deletions(-)
Successfully rebased and updated refs/heads/main.

ℹ Add commit-queue-squash label to land the PR as one commit, or commit-queue-rebase to land as separate commits.

https://github.com/nodejs/node/actions/runs/19733323934

@legendecas legendecas added commit-queue Add this label to land a pull request using GitHub Actions. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. and removed commit-queue Add this label to land a pull request using GitHub Actions. commit-queue-failed An error occurred while landing this pull request using GitHub Actions. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. labels Nov 27, 2025
legendecas pushed a commit that referenced this pull request Nov 27, 2025
Other threads can still hold a valid handle to the tsfn after
finalization if finalization was triggered by
- release with napi_tsfn_abort, or
- environment shutdown

Handle this by:
- protecting finalization itself with the mutex
- if necessary, delay deletion after finalization to when thread_count
  drops to 0
- releasing all resources as soon as possible before deletion

Fixes: #55706
PR-URL: #55877
Co-Authored-By: Gabriel Schulhof <gabrielschulhof@gmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
@legendecas
Copy link
Member

Landed in bff6ea4.

@legendecas legendecas closed this Nov 27, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in Node-API Team Project Nov 27, 2025
targos pushed a commit that referenced this pull request Nov 29, 2025
Other threads can still hold a valid handle to the tsfn after
finalization if finalization was triggered by
- release with napi_tsfn_abort, or
- environment shutdown

Handle this by:
- protecting finalization itself with the mutex
- if necessary, delay deletion after finalization to when thread_count
  drops to 0
- releasing all resources as soon as possible before deletion

Fixes: #55706
PR-URL: #55877
Co-Authored-By: Gabriel Schulhof <gabrielschulhof@gmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. node-api Issues and PRs related to the Node-API.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

napi_threadsafe_function is very hard to use safely

7 participants