Skip to content

Fix race condition when the redis filter is destroyed.#11466

Merged
mattklein123 merged 14 commits intoenvoyproxy:masterfrom
HenryYYang:allowDynamicFilter
Jun 26, 2020
Merged

Fix race condition when the redis filter is destroyed.#11466
mattklein123 merged 14 commits intoenvoyproxy:masterfrom
HenryYYang:allowDynamicFilter

Conversation

@HenryYYang
Copy link
Copy Markdown
Contributor

@HenryYYang HenryYYang commented Jun 5, 2020

Signed-off-by: Henry Yang hyang@lyft.com

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Commit Message: Fix race condition when the redis filter is destroyed.
Additional Description: This fix the bugs with the following stack traces.
Risk Level: Low
Testing:
Docs Changes: N/A
Release Notes: N/A

Backtrace:
#0 0x000055ff2c9cd3d6 in Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool::ThreadLocalPool (this=0x55ff2ef432d8, parent=..., dispatcher=..., cluster_name=...) at external/envoy/source/extensions/filters/network/redis_proxy/conn_pool_impl.cc:74
#0 0x000055ff2c9cd3d6 in Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool::ThreadLocalPool (this=0x55ff2ef432d8, parent=..., dispatcher=..., cluster_name=...) at external/envoy/source/extensions/filters/network/redis_proxy/conn_pool_impl.cc:74
#1 0x000055ff2c9d01ea in __compressed_pair_elem<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl&, Envoy::Event::Dispatcher&, std::__1::basic_string const&, 0, 1, 2> (this=0x55ff2ef432d8, __args=...) at /opt/llvm/bin/../include/c++/v1/memory:2155
#2 __compressed_pair<std::__1::allocatorEnvoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool&, Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl&, Envoy::Event::Dispatcher&, std::__1::basic_string const&> (this=0x55ff2ef432d8, __first_args=..., __second_args=..., __pc=...) at /opt/llvm/bin/../include/c++/v1/memory:2258
#3 std::__1::__shared_ptr_emplace<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool, std::__1::allocatorEnvoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool >::__shared_ptr_emplace<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl&, Envoy::Event::Dispatcher&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&> (this=0x55ff2ef432c0, __a=..., __args=..., __args=..., __args=...) at /opt/llvm/bin/../include/c++/v1/memory:3543
#4 0x000055ff2c9d013a in make_shared<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl&, Envoy::Event::Dispatcher&, std::__1::basic_string const&> (__args=..., __args=..., __args=...) at /opt/llvm/bin/../include/c++/v1/memory:4202
#5 make_shared<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool, Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl&, Envoy::Event::Dispatcher&, std::__1::basic_string const&> (__args=..., __args=..., __args=...) at /opt/llvm/bin/../include/c++/v1/memory:4581
#6 operator() (this=, dispatcher=...) at external/envoy/source/extensions/filters/network/redis_proxy/conn_pool_impl.cc:53
#7 __invoke<(lambda at external/envoy/source/extensions/filters/network/redis_proxy/conn_pool_impl.cc:51:13) &, Envoy::Event::Dispatcher &> (__f=..., __args=...) at /opt/llvm/bin/../include/c++/v1/type_traits:3530
#8 __call<(lambda at external/envoy/source/extensions/filters/network/redis_proxy/conn_pool_impl.cc:51:13) &, Envoy::Event::Dispatcher &> (__args=..., __args=...) at /opt/llvm/bin/../include/c++/v1/__functional_base:317
#9 operator() (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:1533
#10 std::__1::__function::__func<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::InstanceImpl(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::Upstream::ClusterManager&, Envoy::Extensions::NetworkFilters::Common::Redis::Client::ClientFactory&, Envoy::ThreadLocal::SlotAllocator&, envoy::extensions::filters::network::redis_proxy::v3::RedisProxy_ConnPoolSettings const&, Envoy::Api::Api&, std::__1::unique_ptr<Envoy::Stats::Scope, std::__1::default_deleteEnvoy::Stats::Scope >&&, std::__1::shared_ptrEnvoy::Extensions::NetworkFilters::Common::Redis::RedisCommandStats const&, std::__1::shared_ptrEnvoy::Extensions::Common::Redis::ClusterRefreshManager)::$_0, std::__1::allocator<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::InstanceImpl(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::Upstream::ClusterManager&, Envoy::Extensions::NetworkF
ilters::Common::Redis::Client::ClientFactory&, Envoy::ThreadLocal::SlotAllocator&, envoy::extensions::filters::network::redis_proxy::v3::RedisProxy_ConnPoolSettings const&, Envoy::Api::Api&, std::__1::unique_ptr<Envoy::Stats::Scope, std::__1::default_deleteEnvoy::Stats::Scope >&&, std::__1::shared_ptrEnvoy::Extensions::NetworkFilters::Common::Redis::RedisCommandStats const&, std::__1::shared_ptrEnvoy::Extensions::Common::Redis::ClusterRefreshManager)::$_0>, std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>::operator()(Envoy::Event::Dispatcher&) (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:1707
#11 0x000055ff2ca8bd58 in operator() (this=, __args=...) at /opt/llvm/bin/../include/c++/v1/functional:1860
#12 operator() (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:2419
#13 operator() (this=, dispatcher=...) at external/envoy/source/common/thread_local/thread_local_impl.cc:101
#14 __invoke<(lambda at external/envoy/source/common/thread_local/thread_local_impl.cc:100:14) &, Envoy::Event::Dispatcher &> (__f=..., __args=...) at /opt/llvm/bin/../include/c++/v1/type_traits:3530
#15 __call<(lambda at external/envoy/source/common/thread_local/thread_local_impl.cc:100:14) &, Envoy::Event::Dispatcher &> (__args=..., __args=...) at /opt/llvm/bin/../include/c++/v1/__functional_base:317
#16 operator() (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:1533
#17 std::__1::__function::__func<Envoy::ThreadLocal::InstanceImpl::Bookkeeper::set(std::__1::function<std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>)::$_7, std::__1::allocator<Envoy::ThreadLocal::InstanceImpl::Bookkeeper::set(std::__1::function<std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>)::$_7>, std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>::operator()(Envoy::Event::Dispatcher&) (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:1707
#18 0x000055ff2ca8cf88 in operator() (this=, __args=...) at /opt/llvm/bin/../include/c++/v1/functional:1860
#19 operator() (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:2419
#20 operator() (this=) at external/envoy/source/common/thread_local/thread_local_impl.cc:214
#21 __invoke<(lambda at external/envoy/source/common/thread_local/thread_local_impl.cc:214:21) &> (__f=...) at /opt/llvm/bin/../include/c++/v1/type_traits:3530
#22 __call<(lambda at external/envoy/source/common/thread_local/thread_local_impl.cc:214:21) &> (__args=...) at /opt/llvm/bin/../include/c++/v1/__functional_base:348
#23 operator() (this=) at /opt/llvm/bin/../include/c++/v1/functional:1533
#24 std::__1::__function::__func<Envoy::ThreadLocal::InstanceImpl::SlotImpl::set(std::__1::function<std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>)::$_13, std::__1::allocator<Envoy::ThreadLocal::InstanceImpl::SlotImpl::set(std::__1::function<std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>)::$_13>, void ()>::operator()() (this=) at /opt/llvm/bin/../include/c++/v1/functional:1707
#25 0x000055ff2cb0ef76 in operator() (this=0x7f3532c25ad0) at /opt/llvm/bin/../include/c++/v1/functional:1860
#26 operator() (this=0x7f3532c25ad0) at /opt/llvm/bin/../include/c++/v1/functional:2419
#27 Envoy::Event::DispatcherImpl::runPostCallbacks (this=0x55ff2e669080) at external/envoy/source/common/event/dispatcher_impl.cc:233
#28 0x000055ff2ce880a6 in event_process_active_single_queue (base=0x55ff2e2dcdc0, activeq=0x55ff2e4fd030, max_to_process=2147483647, endtime=0x0) at /root/.cache/bazel/_bazel_root/e500e327e6981ab742dbacbb6723d21d/sandbox/processwrapper-sandbox/586/execroot/envoy_lyft/external/com_github_libevent_libevent/event.c:1713
#29 0x000055ff2ce86c2e in event_process_active (base=0x55ff2e2dcdc0) at /root/.cache/bazel/_bazel_root/e500e327e6981ab742dbacbb6723d21d/sandbox/processwrapper-sandbox/586/execroot/envoy_lyft/external/com_github_libevent_libevent/event.c:1808
#30 event_base_loop (base=0x55ff2e2dcdc0, flags=) at /root/.cache/bazel/_bazel_root/e500e327e6981ab742dbacbb6723d21d/sandbox/processwrapper-sandbox/586/execroot/envoy_lyft/external/com_github_libevent_libevent/event.c:2047
#31 0x000055ff2cb035b1 in Envoy::Server::WorkerImpl::threadRoutine (this=0x55ff2e537bd0, guard_dog=...) at external/envoy/source/server/worker_impl.cc:119
#32 0x000055ff2cf397e3 in operator() (this=) at /opt/llvm/bin/../include/c++/v1/functional:1860
#33 operator() (this=) at /opt/llvm/bin/../include/c++/v1/functional:2419
#34 operator() (this=, arg=) at external/envoy/source/common/common/posix/thread_impl.cc:33
#35 Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>)::$_0::__invoke(void*) (arg=) at external/envoy/source/common/common/posix/thread_impl.cc:32
#36 0x00007f353380a184 in start_thread (arg=0x7f3532c37700) at pthread_create.c:312
#37 0x00007f35335370ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Signed-off-by: Henry Yang <hyang@lyft.com>
@HenryYYang HenryYYang requested a review from mattklein123 as a code owner June 5, 2020 10:21
@mattklein123 mattklein123 self-assigned this Jun 5, 2020
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this makes sense but needs more tests including the failure to lock cases, thank you!

/wait

Signed-off-by: Henry Yang <hyang@lyft.com>
@yuval-k
Copy link
Copy Markdown
Contributor

yuval-k commented Jun 8, 2020

for future reference, this is related to #9940

Signed-off-by: Henry Yang <hyang@lyft.com>
Signed-off-by: Henry Yang <hyang@lyft.com>
Signed-off-by: Henry Yang <hyang@lyft.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks looks like this is on the right track. I don't think you need all the locks() that you have, so would prefer to simplify if possible. Thank you!

/wait

Comment thread source/extensions/filters/network/redis_proxy/conn_pool_impl.cc Outdated
});
}

// this method is always called from a InstanceSharedPtr we don't have to worry about tls_->getTyped
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/this/This, same below

Comment thread source/extensions/clusters/redis/redis_cluster.cc
if (cluster.info()->name() != cluster_name_) {
return;
}
// ensure the filter is still exists
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: start with capital, end with period, etc. Same elsewhere.

return nullptr;
}

// ensure filter is not removed
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced you need all of these locks everywhere. How can we be making a request with a removed filter? I'm pretty sure you only need it in the construction path for the original set call?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that when the connection pool instance is destroyed the TLS slot is not immediately destroyed. This leads to the code that cancel the pending requests and closing connections to be delayed as well. During this delay, a redirection from a pending request would cause a new request to be sent to the new host.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Please add more comments around this and also fix format.

request_handler_ = nullptr;
pool_callbacks_.onFailure();
parent_.parent_.onFailure();
if (auto shared_conn_pool = parent_.parent_.lock()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here, if the filter goes away we should be cancelling things. I don't think these are all needed.

Co-authored-by: Matt Klein <mattklein123@gmail.com>
Signed-off-by: Henry Yang <hyang@lyft.com>

Signed-off-by: Henry Yang <4411287+HenryYYang@users.noreply.github.com>
Signed-off-by: Henry Yang <hyang@lyft.com>
Signed-off-by: Henry Yang <hyang@lyft.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable with one nit, but per offline convo did you plan on reworking this?

/wait

Comment thread source/extensions/clusters/redis/redis_cluster.cc
Signed-off-by: Henry Yang <hyang@lyft.com>
Signed-off-by: Henry Yang <hyang@lyft.com>
# Conflicts:
#	source/extensions/filters/network/redis_proxy/conn_pool_impl.cc

Signed-off-by: Henry Yang <hyang@lyft.com>
@HenryYYang
Copy link
Copy Markdown
Contributor Author

This is already the reworked code. We're no longer doing a lock on each request, the lock is checked prior to construction of ThreadLocalPool and during the cluster update callback.

Signed-off-by: Henry Yang <hyang@lyft.com>
Signed-off-by: Henry Yang <hyang@lyft.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

Comment on lines +235 to +237
if (resolve_timer_) {
resolve_timer_->disableTimer();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? If so comment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment. Sorry I missed this one.

Signed-off-by: Henry Yang <hyang@lyft.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattklein123 mattklein123 merged commit 2705235 into envoyproxy:master Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants