Bug 4156: comm.cc "!commHasHalfClosedMonitor(fd)" assertion#1443
Bug 4156: comm.cc "!commHasHalfClosedMonitor(fd)" assertion#1443rousskov wants to merge 3 commits intosquid-cache:masterfrom
Conversation
This bug is specific to "half_closed_clients on" configurations.
assertion failed: ... "isOpen(fd) && !commHasHalfClosedMonitor(fd)"
location: comm.cc:1583 in commStartHalfClosedMonitor()
Squid asserts because Server schedules comm_read() after receiving EOF:
That extra read results in another EOF notification, and an attempt to
start monitoring an already monitored half-closed connection.
Upon detecting a potentially half-closed connection,
Server::doClientRead() should clear flags.readMore to prevent Server
from scheduling another comm_read(), but it does not and cannot do that
(without significant refactoring) because
* Server does not have access to flags.readMore
* flags.readMore hack is used for more than just "read more"
We work around the above limitation by re-detecting half-closed
conditions and clearing flags.readMore after clientParseRequests().
... and another old related XXX. Also removed a stale "where to 'finish' the parsing pass?" XXX. It was added in 2006 commit a5baffb, apparently to indicate the lack of parser cleanup code corresponding to the then-added HttpParserInit() call. Since our request parsing code is (or at least pretends to be) incremental, the lack of that cleanup is not really an XXX anymore, and the corresponding parser reset logic is documented in Http::One::Server::parseOneRequest(). I renamed clientParseRequests() to reduce chances that some old callers will mishandle closed connections when porting this branch changes.
| } else { | ||
| // we parsed what we could, and no more data is coming | ||
| debugs(33, 5, "monitoring half-closed while processing parsed requests: " << clientConnection); | ||
| flags.readMore = false; // may already be false |
There was a problem hiding this comment.
This new assignment is the core of this bug fix. If you want to see that change in isolation from the refactoring "noise", see the first branch commit: c6b2ae6.
| } | ||
| } | ||
|
|
||
| /* XXX where to 'finish' the parsing pass? */ |
There was a problem hiding this comment.
I removed this stale XXX. It was added in 2006 commit a5baffb, apparently to indicate the lack of parser cleanup code corresponding to the then-added HttpParserInit() call. Since our request parsing code is (or at least pretends to be) incremental, the lack of that cleanup is not really an XXX anymore, and the corresponding parser reset logic is documented in Http::One::Server::parseOneRequest().
|
Thanks Alex. I applied the first commit to my Squid 6.1 code and I'm waiting now for the build. I will let you know ASAP if squid is still crashing after the fix or not. |
|
Alex, I can confirm that squid with the first commit applied (the one without all the refactorization) is not crashing anymore. Thanks that you looked into it! :) |
|
Lubos, I appreciate the update! Just FYI: The first branch commit should be safe to use in isolation, but it does not cover one of the code paths that might lead to double-monitoring assertions. The second commit does. Thus, if this fix is merged, I would recommend backporting the whole thing rather than just the first commit, but that decision would be up to maintainers, of course. |
| /// Attempt to parse one or more requests from the input buffer. | ||
| /// May close the connection. | ||
| void | ||
| ConnStateData::parseRequests() |
There was a problem hiding this comment.
Can we please take advantage of this re-naming to replace the word "Requests" with "Messages". That will better fit the modern protocols which can send non-request messages (eg PROXYP header).
There was a problem hiding this comment.
Can we please take advantage of this re-naming to replace the word "Requests" with "Messages". That will better fit the modern protocols which can send non-request messages (eg PROXYP header).
We should not replace Requests with Messages here: This class deals with both requests and responses, each message kind going in both directions. The word "request" orients developers much better than the word "message" would.
Some developers may even object to calling PROXY protocol header bytes a "message", but I bet that most developers would have no trouble with applying the word "request" to bytes like PROXY protocol header. "Request" is a very useful high-level abstraction here. "Message" is less useful.
src/client_side.cc
Outdated
|
|
||
| /* XXX where to 'finish' the parsing pass? */ | ||
| return parsed_req; | ||
| debugs(33, 7, "parsed request(s): " << parsed_req << "; buffered leftovers: " << inBuf.length()); |
There was a problem hiding this comment.
IMO the parser output at higher levels makes it unnecessary that requests were parsed. Please drop parsed_req variable entirely.
There was a problem hiding this comment.
IMO the parser output at higher levels makes it unnecessary that requests were parsed. Please drop parsed_req variable entirely.
When navigating large/complex logs, this variable simplifies detecting cases where some requests were parsed and cases when no requests were parsed. If you insist, I will remove it, but it is useful. Perhaps you would be more comfortable with a conversion of that boolean variable into a counter? That would be even more useful IMO. I can volunteer to do that conversion in a followup PR.
There was a problem hiding this comment.
I do really think it is just useless info. The earlier debugs(33, 3, ...) log entries inside the while-loop are sufficient to show whether a request was parsed, or why not.
Making this line just report that there is buffer leftovers (or not) should be enough and simplifies the code.
There was a problem hiding this comment.
I do really think it is just useless info. The earlier
debugs(33, 3, ...)log entries inside the while-loop are sufficient to show whether a request was parsed, or why not. Making this line just report that there is buffer leftovers (or not) should be enough and simplifies the code.
Those earlier debugs() do not cover all cases. Moreover, that "earlier code" may produce thousands of log lines that make summarizing what happened difficult for a human. However, I removed that field in hope to unblock this PR: commit 4bccb77.
| if (clientParseRequests()) { | ||
| debugs(33, 3, clientConnection << ": parsed next request from buffer"); | ||
| } | ||
| parseRequests(); |
There was a problem hiding this comment.
IMO this parser call was/is wrong. Instead of parsing here the below call to readNextRequest() should be a call to afterClientRead().
GH does not let me mark that if-statement for change. It should be this:
if (Http::StreamPointer deferredRequest = pipeline.front()) {
debugs(33, 3, clientConnection << ": calling PushDeferredIfNeeded");
ClientSocketContextPushDeferredIfNeeded(deferredRequest, this);
} else {
afterClientRead();
}
There was a problem hiding this comment.
I am against calling afterClientRead(), documented as "processing to be done after a Comm::Read()", after we did not read from the client.
I am sure this code needs to be refactored further, but such refactoring can and should be done in dedicated PR(s).
There was a problem hiding this comment.
I was expecting you to understand that calling afterClientRead() is the minimal change appropriate to include here. Just switching the kick() logic to remove the bad logic duplication of the parse+HalfClosed logic you are altering in this PR.
You can add a TODO about refactoring afterClientRead() and getting a better name for it if you like.
There was a problem hiding this comment.
I was expecting you to understand that ...
I hope you will find a better way to phrase that kind of thought next time. FWIW, to reduce insults, I strive to focus on facts and my understanding of them while avoiding talking about (my expectations of) others' understanding.
calling afterClientRead() is the minimal change appropriate to include here.
IMO, that change is neither minimal nor appropriate here. It is fairly complex, unnecessary in this PR, and probably wrong (for the reasons stated earlier and for the reasons discussed below, at least).
if (Http::StreamPointer deferredRequest = pipeline.front()) {
debugs(33, 3, clientConnection << ": calling PushDeferredIfNeeded");
ClientSocketContextPushDeferredIfNeeded(deferredRequest, this);
} else {
afterClientRead();
}Just switching the kick() logic to remove the bad logic duplication of the parse+HalfClosed logic you are altering in this PR.
Making request parsing in kick() conditional on pipeline being empty is outside this PR scope and is probably wrong: In some cases, official kick() code parses requests when pipeline has at least one request queued already. That approach makes sense to me -- when certain conditions are met, a new request may be successfully parsed while the previous request is still in the queue. We have concurrentRequestQueueFilled() checks that are responsible for making those complex decisions. Poorly duplicating those checks in kick() is not an improvement. This PR leaves that controversial/complex/messy/buggy code intact, making a small step forward.
| if (isOpen() && commIsHalfClosed(clientConnection->fd)) { | ||
| if (pipeline.empty()) { | ||
| // we processed what we could parse, and no more data is coming | ||
| debugs(33, 5, "closing half-closed without parsed requests: " << clientConnection); | ||
| clientConnection->close(); | ||
| } else { | ||
| // we parsed what we could, and no more data is coming | ||
| debugs(33, 5, "monitoring half-closed while processing parsed requests: " << clientConnection); | ||
| flags.readMore = false; // may already be false | ||
| } | ||
| } |
There was a problem hiding this comment.
This new/moved code is not related to parsing. Please move this block into a separate method and into the caller. After the change to kick() that will be the only caller.
There was a problem hiding this comment.
This new/moved code is not related to parsing. Please move this block into a separate method
I disagree. This code is closely related to parsing because it would not work correctly in any context except after the parsing loop above it -- it relies heavily on certain parsing state. Moreover, this code must be executed after that parsing loop -- forgetting to do that would (re)introduce bugs. This code essentially finishes parsing or establishes correct job state after low-level parsing is done. I do not recommend moving it out, but I will encapsulate if you insist. Do you insist on that encapsulation?
There was a problem hiding this comment.
This is relevant with the change to kick(). These checks should be done both after parse of new I/O AND before resuming a deferred/pipeline request. Otherwise we may end up doing a full server transaction and trying to write it to a closed client, just because the half-closed monitor was slow.
There was a problem hiding this comment.
There may be other situations like 1xx handling, ssl-bump or tunnels which would benefit from these checks - I have not looked into those.
There was a problem hiding this comment.
These checks should be done both after parse of new I/O AND before resuming a deferred/pipeline request
This PR fixes the former. If somebody investigates, adjusts, and improves the complex latter cases, great! This PR does not need to be responsible for encapsulating context-sensitive code when there is no consensus regarding those future improvements or even about the benefit of such encapsulation itself. We can move on without it.
These checks should be done ... before resuming a deferred/pipeline request. Otherwise we may end up doing a full server transaction and trying to write it to a closed client, just because the half-closed monitor was slow.
I disagree on two levels:
- These checks do nothing useful when "the half-closed monitor is slow": Adding them before resuming a deferred request will not terminate the job when the half-closed monitor has not started closing the connection yet.
- We can check for client closure in dozens of places "just in case" the monitor has started closing the connection, but I am strongly against that strategy because it complicates code and leads to non-technical "is this the right spot for a yet another just-in-case check?" conflicts. When the closure signal reaches the job, the job should cleanup. If we decide to prioritize closure signal delivery or discovery, we can do that, but that work should still aim at centralized detection and cleanup rather than sprinkling code with "just in case" checks.
There may be other situations like 1xx handling, ssl-bump or tunnels which would benefit from these checks - I have not looked into those
I have not looked either, but have doubts about the whole approach as discussed above. When somebody does look and discovers the genuine need for this context-sensitive code to be reused, they will be able to encapsulate/move it according to their needs.
| if (clientParseRequests()) { | ||
| debugs(33, 3, clientConnection << ": parsed next request from buffer"); | ||
| } | ||
| parseRequests(); |
There was a problem hiding this comment.
I am against calling afterClientRead(), documented as "processing to be done after a Comm::Read()", after we did not read from the client.
I am sure this code needs to be refactored further, but such refactoring can and should be done in dedicated PR(s).
| /// Attempt to parse one or more requests from the input buffer. | ||
| /// May close the connection. | ||
| void | ||
| ConnStateData::parseRequests() |
There was a problem hiding this comment.
Can we please take advantage of this re-naming to replace the word "Requests" with "Messages". That will better fit the modern protocols which can send non-request messages (eg PROXYP header).
We should not replace Requests with Messages here: This class deals with both requests and responses, each message kind going in both directions. The word "request" orients developers much better than the word "message" would.
Some developers may even object to calling PROXY protocol header bytes a "message", but I bet that most developers would have no trouble with applying the word "request" to bytes like PROXY protocol header. "Request" is a very useful high-level abstraction here. "Message" is less useful.
src/client_side.cc
Outdated
|
|
||
| /* XXX where to 'finish' the parsing pass? */ | ||
| return parsed_req; | ||
| debugs(33, 7, "parsed request(s): " << parsed_req << "; buffered leftovers: " << inBuf.length()); |
There was a problem hiding this comment.
IMO the parser output at higher levels makes it unnecessary that requests were parsed. Please drop parsed_req variable entirely.
When navigating large/complex logs, this variable simplifies detecting cases where some requests were parsed and cases when no requests were parsed. If you insist, I will remove it, but it is useful. Perhaps you would be more comfortable with a conversion of that boolean variable into a counter? That would be even more useful IMO. I can volunteer to do that conversion in a followup PR.
| if (isOpen() && commIsHalfClosed(clientConnection->fd)) { | ||
| if (pipeline.empty()) { | ||
| // we processed what we could parse, and no more data is coming | ||
| debugs(33, 5, "closing half-closed without parsed requests: " << clientConnection); | ||
| clientConnection->close(); | ||
| } else { | ||
| // we parsed what we could, and no more data is coming | ||
| debugs(33, 5, "monitoring half-closed while processing parsed requests: " << clientConnection); | ||
| flags.readMore = false; // may already be false | ||
| } | ||
| } |
There was a problem hiding this comment.
This new/moved code is not related to parsing. Please move this block into a separate method
I disagree. This code is closely related to parsing because it would not work correctly in any context except after the parsing loop above it -- it relies heavily on certain parsing state. Moreover, this code must be executed after that parsing loop -- forgetting to do that would (re)introduce bugs. This code essentially finishes parsing or establishes correct job state after low-level parsing is done. I do not recommend moving it out, but I will encapsulate if you insist. Do you insist on that encapsulation?
|
@yadij, please note that this PR has been waiting for your review for more than 20 days. Under the "Timeouts for stalled PRs" experiment, it may be force-merged in 10 days. |
This bug is specific to "half_closed_clients on" configurations.
assertion failed: ... "isOpen(fd) && !commHasHalfClosedMonitor(fd)"
location: comm.cc:1583 in commStartHalfClosedMonitor()
Squid asserts because Server schedules comm_read() after receiving EOF:
That extra read results in another EOF notification, and an attempt to
start monitoring an already monitored half-closed connection.
Upon detecting a potentially half-closed connection,
Server::doClientRead() should clear flags.readMore to prevent Server
from scheduling another comm_read(), but it does not and cannot do that
(without significant refactoring) because
* Server does not have access to flags.readMore
* flags.readMore hack is used for more than just "read more"
We worked around the above limitation by re-detecting half-closed
conditions and clearing flags.readMore after clientParseRequests(). That
fixed the bug but further increased poor code duplication across
ConnStateData::afterClientRead() and ConnStateData::kick() methods. We
then refactored by merging and moving that duplicated code into
clientParseRequests() and renamed that method to make backports safer.
This bug is specific to "half_closed_clients on" configurations.
Squid asserts because Server schedules comm_read() after receiving EOF:
That extra read results in another EOF notification, and an attempt to
start monitoring an already monitored half-closed connection.
Upon detecting a potentially half-closed connection,
Server::doClientRead() should clear flags.readMore to prevent Server
from scheduling another comm_read(), but it does not and cannot do that
(without significant refactoring) because
We worked around the above limitation by re-detecting half-closed
conditions and clearing flags.readMore after clientParseRequests(). That
fixed the bug but further increased poor code duplication across
ConnStateData::afterClientRead() and ConnStateData::kick() methods. We
then refactored by merging and moving that duplicated code into
clientParseRequests() and renamed that method to make backports safer.