Skip to content

Conversation

@yamingk
Copy link
Contributor

@yamingk yamingk commented May 7, 2025

…ce issue

For reviewer:

This PR's fix doesn't affect NuObject because

  1. nuObject case doesn't turn on periodic flush for the log dev's flush thread.
  2. Solo Repl dev is not used for nuObject case as well.

Detailed fix:

  1. fix periodic log flush timer cancel_time assert failure in iomgr that complains main fiber can't do sync wait on cancel_timer API.
  2. Fix a race issue between solo repl dev init and destroy path.
    The root cause is during boot, when we there is a solo repl dev pending on destroy, however during boot, its init is also on-going, there is a race window destroy might arrive before open_log_store which is an async call whose thenValue hasn't been called yet, in this case, the m_data_journal is not properly initialized and will cause ASAN issue.
    The fix closed this race window.

Testing:

Cross tested with HomeObj:

conanfile.py (homeobject/2.3.16): RUN: cmake --build "/tmp/source/yk_ho/build/Release" --target test -- -j104
Running tests...
Test project /tmp/source/yk_ho/build/Release
      Start  1: HomestoreTestPg
 1/11 Test  #1: HomestoreTestPg ................................   Passed  111.22 sec
      Start  2: HomestoreTestShard
 2/11 Test  #2: HomestoreTestShard .............................   Passed  131.97 sec
      Start  3: HomestoreTestBlob
 3/11 Test  #3: HomestoreTestBlob ..............................   Passed   48.66 sec
      Start  4: HomestoreTestMisc
 4/11 Test  #4: HomestoreTestMisc ..............................   Passed   38.10 sec
      Start  5: HomestoreTestReplaceMember
 5/11 Test  #5: HomestoreTestReplaceMember .....................   Passed   36.55 sec
      Start  6: HomestoreTestReplaceMemberWithBaselineResync
 6/11 Test  #6: HomestoreTestReplaceMemberWithBaselineResync ...   Passed   30.46 sec
      Start  7: HomestoreResyncTestWithFollowerRestart
 7/11 Test  #7: HomestoreResyncTestWithFollowerRestart .........   Passed  126.51 sec
      Start  8: HomestoreResyncTestWithLeaderRestart
 8/11 Test  #8: HomestoreResyncTestWithLeaderRestart ...........   Passed   50.19 sec
      Start  9: HeapChunkSelectorTest
 9/11 Test  #9: HeapChunkSelectorTest ..........................   Passed    0.05 sec
      Start 10: MemoryTestCPU
10/11 Test #10: MemoryTestCPU ..................................   Passed    0.76 sec
      Start 11: MemoryTestIO
11/11 Test #11: MemoryTestIO ...................................   Passed    0.79 sec

100% tests passed, 0 tests failed out of 11

Total Test time (real) = 575.29 sec

@yamingk yamingk closed this May 7, 2025
@codecov-commenter
Copy link

codecov-commenter commented May 7, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 50.00000% with 9 lines in your changes missing coverage. Please review.

Project coverage is 66.94%. Comparing base (1a0cef8) to head (4fc929a).
Report is 191 commits behind head on master.

Files with missing lines Patch % Lines
src/lib/replication/repl_dev/solo_repl_dev.cpp 40.00% 6 Missing ⚠️
src/lib/replication/service/generic_repl_svc.cpp 25.00% 1 Missing and 2 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           master     #715       +/-   ##
===========================================
+ Coverage   56.51%   66.94%   +10.43%     
===========================================
  Files         108      109        +1     
  Lines       10300    11956     +1656     
  Branches     1402     1667      +265     
===========================================
+ Hits         5821     8004     +2183     
+ Misses       3894     3116      -778     
- Partials      585      836      +251     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yamingk yamingk reopened this May 7, 2025
@yamingk yamingk changed the title Fix log periodic cancelt_imer issue and solo repl dev init/destroy ra… Issue 716: Fix log periodic cancelt_imer issue and solo repl dev init/destroy ra… May 7, 2025
@yamingk yamingk merged commit e87feea into eBay:master May 8, 2025
24 checks passed
hkadayam pushed a commit to hkadayam/HomeStore that referenced this pull request Aug 7, 2025
…/destroy ra… (eBay#715)

* Fix log periodic cancelt_imer issue and solo repl dev init/destroy race issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Assert failure during cancel_timer for periodic timer of log flush during log store destroy

3 participants