Fixes crashes around OCSP with FetchSM#9672
Conversation
| fetch() | ||
| { | ||
| SCOPED_MUTEX_LOCK(lock, mutex, this_ethread()); | ||
| this->_result = 0; |
There was a problem hiding this comment.
Looking at this:
wkaras ~/REPOS/TS
O$ grep -e '\Wfetch(' -e '\Wset_done(' -e '\Wset_error(' $(findsrc)
./include/tscore/ArgParser.h: void set_error(std::string e);
./iocore/net/OCSPStapling.cc: this->fetch();
./iocore/net/OCSPStapling.cc: req->set_done();
./iocore/net/OCSPStapling.cc: req->set_error();
./iocore/net/OCSPStapling.cc: fetch()
./iocore/net/OCSPStapling.cc: set_done()
./iocore/net/OCSPStapling.cc: set_error()
./plugins/experimental/inliner/cache.h: fetch(const std::string &k, A &&...a)
./plugins/experimental/slice/prefetch.cc: if (bg->fetch(data)) {
./plugins/experimental/slice/prefetch.cc:BgBlockFetch::fetch(Data *const data)
./plugins/experimental/slice/prefetch.h: bool fetch(Data *const data);
./plugins/lua/ts_lua_fetch.c: /* ts.fetch() */
./src/tscore/ArgParser.cc:ArgParser::set_error(std::string e)
./src/tscore/unit_tests/test_ArgParser.cc: parser2.set_error("error");
wkaras ~/REPOS/TS
O$
it's clear that fetch(), set_done() and set_error() are only called by event_handler(). So, the continuation mutex will already be locked when they are called. It also means these member functions should be private. I suggest you make them private as a part of this PR.
There was a problem hiding this comment.
In event_handler(), won't req always be the same as this? What's the point of it then?
There was a problem hiding this comment.
I made the functions private and removed unnecessary calls for ext_set/get_user_data.
So, the continuation mutex will already be locked when they are called.
This may be true, but I kept SCOPED_MUTEX_LOCK, which should be almost no-op if the lock is acquired. Because e->mutex and e->continuation->mutex can be different under some conditions and e->continuation->mutex is not locked in that case. It may not be the case for this, but I'd rather explicitly acquire the lock.
* asf/master: Fix ttmsh log field (apache#9722) Add CentOS to the required builds (apache#9721) Fix H3 transaction leak (apache#9714) Only need to include eventfd for native mode. This was using the wrong define anyway (apache#9711) Cleanup: remove ts::Buffer from LogField. (apache#9665) Cleanup: remove ts::Buffer from URL.cc (apache#9663) Check the calling thread of Ethread::schedule_local (apache#9691) build_h3_tools.sh: Remove an unneeded dir check (apache#9710) autoconf: Add lib flags for the quiche build test. (apache#9679) Remove deprecated debug output functions from 13 source files. (apache#9676) Changes for C++23 (apache#9703) QUIC: Add a unit tests to validate that the qlog file is generated (and no crashes) (apache#9668) libswoc: Update to 1.4.10 (apache#9700) Reload hosting.config on TASK thread (apache#9699) Changes for C++20 (apache#9701) Make io_uring or thread AIO modes a startup time decision (vs compile time) (apache#9630) Replace curl with proxy verifier in proxy protocol tests (apache#9684) Fix event queue corruption on PreWarmManager::reconfigure (apache#9692) Fixes crashes around OCSP with FetchSM (apache#9672)
We saw two kind of crashes while testing #9591 and #9624, and the both were triggered by OCSP request timeout. In a nutshell, the crashes are due to accessing unavailable continuations. Continuations become unavailable when OCSP requests timeout, but the status were not properly updated/checked.