Skip to content

fix: wrong lock order, observed locally on top of #6467 with custom changes#6470

Merged
PastaPastaPasta merged 1 commit into
dashpay:developfrom
knst:fix-deadlock-getquorumdata
Dec 11, 2024
Merged

fix: wrong lock order, observed locally on top of #6467 with custom changes#6470
PastaPastaPasta merged 1 commit into
dashpay:developfrom
knst:fix-deadlock-getquorumdata

Conversation

@knst
Copy link
Copy Markdown
Collaborator

@knst knst commented Dec 9, 2024

Issue being fixed or feature implemented

Assertion failed: detected inconsistent lock order for 'cs_main' in node/blockstorage.cpp:775 (in thread 'httpworker.2'), details in debug log.
   Previous lock order was:
    (2) 'cs_main' in rpc/net.cpp:666 (in thread 'httpworker.3')
    (1) 'm_nodes_mutex' in net.cpp:4754 (in thread 'httpworker.3')
   Current lock order is:
    (1) 'm_nodes_mutex' in net.cpp:5102 (in thread 'httpworker.2')
    (2) 'cs_main' in node/blockstorage.cpp:775 (in thread 'httpworker.2')
node0 2024-12-09T07:46:49.907123Z (mocktime: 2014-12-04T18:34:20Z) [httpworker.2] [stacktraces.cpp:528] [PrintCrashInfo] Posix Signal: Aborted

...
 15#: (0x5764EB18DE42) sync.cpp:123           - potential_deadlock_detected
 16#: (0x5764EB194A7E) sync.cpp:190           - push_lock<std::recursive_mutex>
 17#: (0x5764EB194A7E) sync.cpp:214           - void EnterCritical<std::recursive_mutex>(char const*, char const*, int, std::recursive_mutex*, bool)
 18#: (0x5764EA928642) unique_lock.h:150      - std::unique_lock<std::recursive_mutex>::try_lock()
 19#: (0x5764EA928642) sync.h:168             - UniqueLock<AnnotatedMixin<std::recursive_mutex>, std::unique_lock<std::recursive_mutex> >::Enter(char const*, char const*, int)
 20#: (0x5764EA928642) sync.h:190             - UniqueLock<AnnotatedMixin<std::recursive_mutex>, std::unique_lock<std::recursive_mutex> >::UniqueLock(AnnotatedMixin<std::recursive_mutex>&, char const*, char const*, int, bool)
 21#: (0x5764EAAD9C3D) chain.h:214            - CBlockIndex::GetBlockPos() const
 22#: (0x5764EAAD9C3D) blockstorage.cpp:775   - operator()
 23#: (0x5764EAAD9C3D) blockstorage.cpp:775   - ReadBlockFromDisk(CBlock&, CBlockIndex const*, Consensus::Params const&)
 24#: (0x5764EADB117A) stl_vector.h:1126      - std::vector<std::shared_ptr<CTransaction const>, std::allocator<std::shared_ptr<CTransaction const> > >::operator[](unsigned long)
 25#: (0x5764EADB117A) cbtx.cpp:467           - GetNonNullCoinbaseChainlock(CBlockIndex const*)
 26#: (0x5764EA9FE4F8) utils.cpp:84           - GetHashModifier
 27#: (0x5764EAA05291) utils.cpp:189          - ComputeQuorumMembers
 28#: (0x5764EAA05291) utils.cpp:167          - llmq::utils::GetAllQuorumMembers(Consensus::LLMQType, CDeterministicMNManager&, gsl::not_null<CBlockIndex const*>, bool)
 29#: (0x5764EA9906FC) stl_vector.h:1258      - std::vector<std::shared_ptr<CDeterministicMN const>, std::allocator<std::shared_ptr<CDeterministicMN const> > >::data()
 30#: (0x5764EA9906FC) span.h:164             - Span<std::shared_ptr<CDeterministicMN const> >::Span<std::vector<std::shared_ptr<CDeterministicMN const>, std::allocator<std::shared_ptr<CDeterministicMN const> > > >(std::vector<std::shared_ptr<CDeterministicMN const>, std::allocator<std::shared_ptr<CDeterministicMN const> > >&, std::enable_if<((!Span<std::shared_ptr<CDeterministicMN const> >::is_Span<std::vector<std::shared_ptr<CDeterministicMN const>, std::allocator<std::shared_ptr<CDeterministicMN const> > > >::value)&&std::is_convertible<std::remove_pointer<decltype ((((declval<std::vector<std::shared_ptr<CDeterministicMN const>, std::allocator<std::shared_ptr<CDeterministicMN const> > >&>)()).data)())>::type (*) [], std::shared_ptr<CDeterministicMN const> (*) []>::value)&&std::is_convertible<decltype ((((declval<std::vector<std::shared_ptr<CDeterministicMN const>, std::allocator<std::shared_ptr<CDeterministicMN const> > >&>)()).size)()), unsigned long>::value, decltype(nullptr)>::type)
 31#: (0x5764EA9906FC) quorums.cpp:412        - llmq::CQuorumManager::BuildQuorumFromCommitment(Consensus::LLMQType, gsl::not_null<CBlockIndex const*>, bool) const
 32#: (0x5764EA99198B) shared_ptr_base.h:1540 - std::__shared_ptr<llmq::CQuorum const, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<llmq::CQuorum, void>(std::__shared_ptr<llmq::CQuorum, (__gnu_cxx::_Lock_policy)2>&&)
 33#: (0x5764EA99198B) shared_ptr.h:369       - std::shared_ptr<llmq::CQuorum const>::shared_ptr<llmq::CQuorum, void>(std::shared_ptr<llmq::CQuorum>&&)
 34#: (0x5764EA99198B) quorums.cpp:672        - llmq::CQuorumManager::GetQuorum(Consensus::LLMQType, gsl::not_null<CBlockIndex const*>, bool) const
 35#: (0x5764EA991BD6) shared_ptr_base.h:1070 - std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()
 36#: (0x5764EA991BD6) shared_ptr_base.h:1524 - std::__shared_ptr<llmq::CQuorum const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr()
 37#: (0x5764EA991BD6) shared_ptr.h:175       - std::shared_ptr<llmq::CQuorum const>::~shared_ptr()
 38#: (0x5764EA991BD6) quorums.cpp:494        - llmq::CQuorumManager::RequestQuorumData(CNode*, Consensus::LLMQType, CBlockIndex const*, unsigned short, uint256 const&) const
 ...

What was done?

Refactored call of GetQuorum out of llmq::CQuorumManager::RequestQuorumData. It also optimize CQuorumManager::StartQuorumDataRecoveryThread because avoid double call of GetQuorum

How Has This Been Tested?

Run with and without this changes.

Breaking Changes

You may observe error RPC_INVALID_PARAMETER, "quorum not found" while calling RPC quorum getdata instead returning false with log output only in case of request for non-existing quorum.
The output is not documented at the moment anyway.

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone

@knst knst force-pushed the fix-deadlock-getquorumdata branch from a79e983 to 747fe34 Compare December 9, 2024 09:56
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 9, 2024

This pull request has conflicts, please rebase.

@knst knst mentioned this pull request Dec 10, 2024
5 tasks
…ustom changes

    Assertion failed: detected inconsistent lock order for 'cs_main' in node/blockstorage.cpp:775 (in thread 'httpworker.2'), details in debug log.
       Previous lock order was:
        (2) 'cs_main' in rpc/net.cpp:666 (in thread 'httpworker.3')
        (1) 'm_nodes_mutex' in net.cpp:4754 (in thread 'httpworker.3')
       Current lock order is:
        (1) 'm_nodes_mutex' in net.cpp:5102 (in thread 'httpworker.2')
        (2) 'cs_main' in node/blockstorage.cpp:775 (in thread 'httpworker.2')
    node0 2024-12-09T07:46:49.907123Z (mocktime: 2014-12-04T18:34:20Z) [httpworker.2] [stacktraces.cpp:528] [PrintCrashInfo] Posix Signal: Aborted

    ...
     15#: (0x5764EB18DE42) sync.cpp:123           - potential_deadlock_detected
     16#: (0x5764EB194A7E) sync.cpp:190           - push_lock<std::recursive_mutex>
     17#: (0x5764EB194A7E) sync.cpp:214           - void EnterCritical<std::recursive_mutex>(char const*, char const*, int, std::recursive_mutex*, bool)
     18#: (0x5764EA928642) unique_lock.h:150      - std::unique_lock<std::recursive_mutex>::try_lock()
     19#: (0x5764EA928642) sync.h:168             - UniqueLock<AnnotatedMixin<std::recursive_mutex>, std::unique_lock<std::recursive_mutex> >::Enter(char const*, char const*, int)
     20#: (0x5764EA928642) sync.h:190             - UniqueLock<AnnotatedMixin<std::recursive_mutex>, std::unique_lock<std::recursive_mutex> >::UniqueLock(AnnotatedMixin<std::recursive_mutex>&, char const*, char const*, int, bool)
     21#: (0x5764EAAD9C3D) chain.h:214            - CBlockIndex::GetBlockPos() const
     22#: (0x5764EAAD9C3D) blockstorage.cpp:775   - operator()
     23#: (0x5764EAAD9C3D) blockstorage.cpp:775   - ReadBlockFromDisk(CBlock&, CBlockIndex const*, Consensus::Params const&)
     24#: (0x5764EADB117A) stl_vector.h:1126      - std::vector<std::shared_ptr<CTransaction const>, std::allocator<std::shared_ptr<CTransaction const> > >::operator[](unsigned long)
     25#: (0x5764EADB117A) cbtx.cpp:467           - GetNonNullCoinbaseChainlock(CBlockIndex const*)
     26#: (0x5764EA9FE4F8) utils.cpp:84           - GetHashModifier
     27#: (0x5764EAA05291) utils.cpp:189          - ComputeQuorumMembers
     28#: (0x5764EAA05291) utils.cpp:167          - llmq::utils::GetAllQuorumMembers(Consensus::LLMQType, CDeterministicMNManager&, gsl::not_null<CBlockIndex const*>, bool)
     29#: (0x5764EA9906FC) stl_vector.h:1258      - std::vector<std::shared_ptr<CDeterministicMN const>, std::allocator<std::shared_ptr<CDeterministicMN const> > >::data()
     30#: (0x5764EA9906FC) span.h:164             - Span<std::shared_ptr<CDeterministicMN const> >::Span<std::vector<std::shared_ptr<CDeterministicMN const>, std::allocator<std::shared_ptr<CDeterministicMN const> > > >(std::vector<std::shared_ptr<CDeterministicMN const>, std::allocator<std::shared_ptr<CDeterministicMN const> > >&, std::enable_if<((!Span<std::shared_ptr<CDeterministicMN const> >::is_Span<std::vector<std::shared_ptr<CDeterministicMN const>, std::allocator<std::shared_ptr<CDeterministicMN const> > > >::value)&&std::is_convertible<std::remove_pointer<decltype ((((declval<std::vector<std::shared_ptr<CDeterministicMN const>, std::allocator<std::shared_ptr<CDeterministicMN const> > >&>)()).data)())>::type (*) [], std::shared_ptr<CDeterministicMN const> (*) []>::value)&&std::is_convertible<decltype ((((declval<std::vector<std::shared_ptr<CDeterministicMN const>, std::allocator<std::shared_ptr<CDeterministicMN const> > >&>)()).size)()), unsigned long>::value, decltype(nullptr)>::type)
     31#: (0x5764EA9906FC) quorums.cpp:412        - llmq::CQuorumManager::BuildQuorumFromCommitment(Consensus::LLMQType, gsl::not_null<CBlockIndex const*>, bool) const
     32#: (0x5764EA99198B) shared_ptr_base.h:1540 - std::__shared_ptr<llmq::CQuorum const, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<llmq::CQuorum, void>(std::__shared_ptr<llmq::CQuorum, (__gnu_cxx::_Lock_policy)2>&&)
     33#: (0x5764EA99198B) shared_ptr.h:369       - std::shared_ptr<llmq::CQuorum const>::shared_ptr<llmq::CQuorum, void>(std::shared_ptr<llmq::CQuorum>&&)
     34#: (0x5764EA99198B) quorums.cpp:672        - llmq::CQuorumManager::GetQuorum(Consensus::LLMQType, gsl::not_null<CBlockIndex const*>, bool) const
     35#: (0x5764EA991BD6) shared_ptr_base.h:1070 - std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()
     36#: (0x5764EA991BD6) shared_ptr_base.h:1524 - std::__shared_ptr<llmq::CQuorum const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr()
     37#: (0x5764EA991BD6) shared_ptr.h:175       - std::shared_ptr<llmq::CQuorum const>::~shared_ptr()
     38#: (0x5764EA991BD6) quorums.cpp:494        - llmq::CQuorumManager::RequestQuorumData(CNode*, Consensus::LLMQType, CBlockIndex const*, unsigned short, uint256 const&) const
     ...
@knst knst force-pushed the fix-deadlock-getquorumdata branch from 747fe34 to 1b0af99 Compare December 10, 2024 14:07
@UdjinM6 UdjinM6 added this to the 22.1 milestone Dec 10, 2024
Copy link
Copy Markdown

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 1b0af99

Copy link
Copy Markdown
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 1b0af99

@PastaPastaPasta PastaPastaPasta merged commit 6d97441 into dashpay:develop Dec 11, 2024
@knst knst deleted the fix-deadlock-getquorumdata branch January 14, 2025 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants