test: fix the race of release read filter in FakeRawConnection#26099
test: fix the race of release read filter in FakeRawConnection#26099kyessenov merged 4 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
|
/retest |
|
Retrying Azure Pipelines: |
|
/wait emm... I don't think I got the right fix. |
|
The race happened between the destructor of shared_ptr and weak_ptr
But seems like the shared_ptr race is at line of calling the weak_ptr part is at line of deallocate The shared_ptr and weak_ptr share the same control block |
|
finally I think this is a way to fix it. This is ready for review. |
|
one more note: From https://en.cppreference.com/w/cpp/memory/shared_ptr |
|
|
||
| std::string data_ ABSL_GUARDED_BY(lock_); | ||
| std::weak_ptr<Network::ReadFilter> read_filter_; | ||
| std::shared_ptr<Network::ReadFilter> read_filter_; |
There was a problem hiding this comment.
Connections don't own the filters in the production code. This seems like it'll reduce coverage for cases when filters get deleted earlier than before.
There was a problem hiding this comment.
@kyessenov thanks for the review!
Sorry, I wasn't sure I understand your comment. Do you mean if we remove the filters before the end of connection, the read_filter won't be released as expected since it is shared ptr?
The FakeRawConnection has a shorter lifetime than the shared connection, so the read filter will be removed early than the connection end. And the read filter at here is just for removeReadFilter to query the correct filter.
Is there any case the filter will be removed and not by the FakeRawConnection end?
There was a problem hiding this comment.
I'm not sure, but this comment makes me concerned:
// If the filter was already deleted, it means the shared_connection_ was too, so don't try to
// access it.
But then shared connection is not a weak ptr, so is the comment out of date?
There was a problem hiding this comment.
I'm not sure, but this comment makes me concerned:
// If the filter was already deleted, it means the shared_connection_ was too, so don't try to // access it.But then shared connection is not a weak ptr, so is the comment out of date?
I updated the check if (read_filter_ != nullptr && read_filter_.use_count() > 1) {, it should be safer.
But I still didn't see the shared_connection is gonna be free before the FakeRawConnection, since FakeRawConnection is a reference to the SharedConnectionWrapper, and SharedConnectionWrapper is a reference to Network::Connection. So if the Network::Connection is released early, all those references are broken also.
But the Network::Connection can end early indeed, I think it protected by a flag in the SharedConnectionWrapper
envoy/test/integration/fake_upstream.h
Line 299 in bcdfb2a
When you still want to execute something on the disconnect connection, that flag protect that
envoy/test/integration/fake_upstream.h
Line 335 in bcdfb2a
There was a problem hiding this comment.
Should we use disconnected_ flag in the fake raw connection destructor, too?
I think the idea is that we only need to remove the filter if the connection is alive.
There was a problem hiding this comment.
good point, I can use the disconnected_ and it protected by lock also.
There was a problem hiding this comment.
emm... actually, executeOnDispatcher check the disconnected_
envoy/test/integration/fake_upstream.h
Lines 330 to 337 in 7755d31
So it should be safe just call executeOnDispatcher directly.
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
| // If the filter was already deleted, it means the shared_connection_ was too, so don't try to | ||
| // access it. | ||
| if (auto filter = read_filter_.lock(); filter != nullptr) { | ||
| if (read_filter_ != nullptr && read_filter_.use_count() > 1) { |
There was a problem hiding this comment.
use_count is not safe for multi-threaded environments as far as I remember. weak_ptr is safe.
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
|
@kyessenov gentle ping :) |
kyessenov
left a comment
There was a problem hiding this comment.
Thank you for the fix!
|
/retest |
|
Retrying Azure Pipelines: |
…proxy#26099) Commit Message: test: fix the race of release read filter in FakeRawConnection Additional Description: There is a race when the destructor of FakeRawConnection is invoked in the test main thread. The shared ptr of FakeRawConnection::read_filter_ will be released both in the test main thread and upstream connection's own thread by calling removeReadFilter. This PR just use the shared ptr for FakeRawConnection::read_filter_, then after the read_filter_ move into the lambda, there will be no release work for the read_filter in the test main thread anymore. Risk Level: low Testing: integration test Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Fixes part of envoyproxy#26082 Signed-off-by: Ashish Banerjee <ashish.banerjee@solo.io>
Commit Message: test: fix the race of release read filter in FakeRawConnection
Additional Description:
There is a race when the destructor of FakeRawConnection is invoked in the test main thread.
The shared ptr of
FakeRawConnection::read_filter_will be released both in the test main thread and upstream connection's own thread by callingremoveReadFilter.This PR just use the shared ptr for
FakeRawConnection::read_filter_, then after the read_filter_ move into the lambda, there will be no release work for the read_filter in the test main thread anymore.Risk Level: low
Testing: integration test
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a
Fixes part of #26082