Bug 4156: comm.cc "!commHasHalfClosedMonitor(fd)" assertion by rousskov · Pull Request #1443 · squid-cache/squid

rousskov · 2023-08-04T15:18:46Z

This bug is specific to "half_closed_clients on" configurations.

assertion failed: ... "isOpen(fd) && !commHasHalfClosedMonitor(fd)"
location: comm.cc:1583 in commStartHalfClosedMonitor()

Squid asserts because Server schedules comm_read() after receiving EOF:
That extra read results in another EOF notification, and an attempt to
start monitoring an already monitored half-closed connection.

Upon detecting a potentially half-closed connection,
Server::doClientRead() should clear flags.readMore to prevent Server
from scheduling another comm_read(), but it does not and cannot do that
(without significant refactoring) because

Server does not have access to flags.readMore
flags.readMore hack is used for more than just "read more"

We worked around the above limitation by re-detecting half-closed
conditions and clearing flags.readMore after clientParseRequests(). That
fixed the bug but further increased poor code duplication across
ConnStateData::afterClientRead() and ConnStateData::kick() methods. We
then refactored by merging and moving that duplicated code into
clientParseRequests() and renamed that method to make backports safer.

This bug is specific to "half_closed_clients on" configurations. assertion failed: ... "isOpen(fd) && !commHasHalfClosedMonitor(fd)" location: comm.cc:1583 in commStartHalfClosedMonitor() Squid asserts because Server schedules comm_read() after receiving EOF: That extra read results in another EOF notification, and an attempt to start monitoring an already monitored half-closed connection. Upon detecting a potentially half-closed connection, Server::doClientRead() should clear flags.readMore to prevent Server from scheduling another comm_read(), but it does not and cannot do that (without significant refactoring) because * Server does not have access to flags.readMore * flags.readMore hack is used for more than just "read more" We work around the above limitation by re-detecting half-closed conditions and clearing flags.readMore after clientParseRequests().

... and another old related XXX. Also removed a stale "where to 'finish' the parsing pass?" XXX. It was added in 2006 commit a5baffb, apparently to indicate the lack of parser cleanup code corresponding to the then-added HttpParserInit() call. Since our request parsing code is (or at least pretends to be) incremental, the lack of that cleanup is not really an XXX anymore, and the corresponding parser reset logic is documented in Http::One::Server::parseOneRequest(). I renamed clientParseRequests() to reduce chances that some old callers will mishandle closed connections when porting this branch changes.

rousskov · 2023-08-04T15:24:40Z

@uhliarik, if you can test this fix, please do. If you have trouble applying the changes to your Squid code, try applying just the first branch commit changes.

rousskov · 2023-08-04T15:27:01Z

src/client_side.cc

+        } else {
+            // we parsed what we could, and no more data is coming
+            debugs(33, 5, "monitoring half-closed while processing parsed requests: " << clientConnection);
+            flags.readMore = false; // may already be false


This new assignment is the core of this bug fix. If you want to see that change in isolation from the refactoring "noise", see the first branch commit: c6b2ae6.

rousskov · 2023-08-04T15:27:44Z

src/client_side.cc

        }
    }

-    /* XXX where to 'finish' the parsing pass? */


I removed this stale XXX. It was added in 2006 commit a5baffb, apparently to indicate the lack of parser cleanup code corresponding to the then-added HttpParserInit() call. Since our request parsing code is (or at least pretends to be) incremental, the lack of that cleanup is not really an XXX anymore, and the corresponding parser reset logic is documented in Http::One::Server::parseOneRequest().

uhliarik · 2023-08-04T16:21:23Z

Thanks Alex. I applied the first commit to my Squid 6.1 code and I'm waiting now for the build. I will let you know ASAP if squid is still crashing after the fix or not.

uhliarik · 2023-08-04T18:15:33Z

Alex, I can confirm that squid with the first commit applied (the one without all the refactorization) is not crashing anymore. Thanks that you looked into it! :)

rousskov · 2023-08-04T21:18:55Z

Lubos, I appreciate the update! Just FYI: The first branch commit should be safe to use in isolation, but it does not cover one of the code paths that might lead to double-monitoring assertions. The second commit does. Thus, if this fix is merged, I would recommend backporting the whole thing rather than just the first commit, but that decision would be up to maintainers, of course.

yadij · 2023-08-06T09:48:58Z

src/client_side.cc

+/// Attempt to parse one or more requests from the input buffer.
+/// May close the connection.
+void
+ConnStateData::parseRequests()


Can we please take advantage of this re-naming to replace the word "Requests" with "Messages". That will better fit the modern protocols which can send non-request messages (eg PROXYP header).

Can we please take advantage of this re-naming to replace the word "Requests" with "Messages". That will better fit the modern protocols which can send non-request messages (eg PROXYP header).

We should not replace Requests with Messages here: This class deals with both requests and responses, each message kind going in both directions. The word "request" orients developers much better than the word "message" would.

Some developers may even object to calling PROXY protocol header bytes a "message", but I bet that most developers would have no trouble with applying the word "request" to bytes like PROXY protocol header. "Request" is a very useful high-level abstraction here. "Message" is less useful.

src/client_side.cc

yadij · 2023-08-06T10:31:30Z

src/client_side.cc


-    /* XXX where to 'finish' the parsing pass? */
-    return parsed_req;
+    debugs(33, 7, "parsed request(s): " << parsed_req << "; buffered leftovers: " << inBuf.length());


IMO the parser output at higher levels makes it unnecessary that requests were parsed. Please drop parsed_req variable entirely.

IMO the parser output at higher levels makes it unnecessary that requests were parsed. Please drop parsed_req variable entirely.

When navigating large/complex logs, this variable simplifies detecting cases where some requests were parsed and cases when no requests were parsed. If you insist, I will remove it, but it is useful. Perhaps you would be more comfortable with a conversion of that boolean variable into a counter? That would be even more useful IMO. I can volunteer to do that conversion in a followup PR.

I do really think it is just useless info. The earlier debugs(33, 3, ...) log entries inside the while-loop are sufficient to show whether a request was parsed, or why not.
Making this line just report that there is buffer leftovers (or not) should be enough and simplifies the code.

I do really think it is just useless info. The earlier debugs(33, 3, ...) log entries inside the while-loop are sufficient to show whether a request was parsed, or why not. Making this line just report that there is buffer leftovers (or not) should be enough and simplifies the code.

Those earlier debugs() do not cover all cases. Moreover, that "earlier code" may produce thousands of log lines that make summarizing what happened difficult for a human. However, I removed that field in hope to unblock this PR: commit 4bccb77.

yadij · 2023-08-06T10:49:55Z

src/client_side.cc

-    if (clientParseRequests()) {
-        debugs(33, 3, clientConnection << ": parsed next request from buffer");
-    }
+    parseRequests();


IMO this parser call was/is wrong. Instead of parsing here the below call to readNextRequest() should be a call to afterClientRead().

GH does not let me mark that if-statement for change. It should be this:

if (Http::StreamPointer deferredRequest = pipeline.front()) { debugs(33, 3, clientConnection << ": calling PushDeferredIfNeeded"); ClientSocketContextPushDeferredIfNeeded(deferredRequest, this); } else { afterClientRead(); }

I am against calling afterClientRead(), documented as "processing to be done after a Comm::Read()", after we did not read from the client.

I am sure this code needs to be refactored further, but such refactoring can and should be done in dedicated PR(s).

I was expecting you to understand that calling afterClientRead() is the minimal change appropriate to include here. Just switching the kick() logic to remove the bad logic duplication of the parse+HalfClosed logic you are altering in this PR.

You can add a TODO about refactoring afterClientRead() and getting a better name for it if you like.

I was expecting you to understand that ...

I hope you will find a better way to phrase that kind of thought next time. FWIW, to reduce insults, I strive to focus on facts and my understanding of them while avoiding talking about (my expectations of) others' understanding.

calling afterClientRead() is the minimal change appropriate to include here.

IMO, that change is neither minimal nor appropriate here. It is fairly complex, unnecessary in this PR, and probably wrong (for the reasons stated earlier and for the reasons discussed below, at least).

if (Http::StreamPointer deferredRequest = pipeline.front()) { debugs(33, 3, clientConnection << ": calling PushDeferredIfNeeded"); ClientSocketContextPushDeferredIfNeeded(deferredRequest, this); } else { afterClientRead(); }

Just switching the kick() logic to remove the bad logic duplication of the parse+HalfClosed logic you are altering in this PR.

Making request parsing in kick() conditional on pipeline being empty is outside this PR scope and is probably wrong: In some cases, official kick() code parses requests when pipeline has at least one request queued already. That approach makes sense to me -- when certain conditions are met, a new request may be successfully parsed while the previous request is still in the queue. We have concurrentRequestQueueFilled() checks that are responsible for making those complex decisions. Poorly duplicating those checks in kick() is not an improvement. This PR leaves that controversial/complex/messy/buggy code intact, making a small step forward.

yadij · 2023-08-06T11:07:57Z

src/client_side.cc

+    if (isOpen() && commIsHalfClosed(clientConnection->fd)) {
+        if (pipeline.empty()) {
+            // we processed what we could parse, and no more data is coming
+            debugs(33, 5, "closing half-closed without parsed requests: " << clientConnection);
+            clientConnection->close();
+        } else {
+            // we parsed what we could, and no more data is coming
+            debugs(33, 5, "monitoring half-closed while processing parsed requests: " << clientConnection);
+            flags.readMore = false; // may already be false
+        }
+    }


This new/moved code is not related to parsing. Please move this block into a separate method and into the caller. After the change to kick() that will be the only caller.

This new/moved code is not related to parsing. Please move this block into a separate method

I disagree. This code is closely related to parsing because it would not work correctly in any context except after the parsing loop above it -- it relies heavily on certain parsing state. Moreover, this code must be executed after that parsing loop -- forgetting to do that would (re)introduce bugs. This code essentially finishes parsing or establishes correct job state after low-level parsing is done. I do not recommend moving it out, but I will encapsulate if you insist. Do you insist on that encapsulation?

This is relevant with the change to kick(). These checks should be done both after parse of new I/O AND before resuming a deferred/pipeline request. Otherwise we may end up doing a full server transaction and trying to write it to a closed client, just because the half-closed monitor was slow.

There may be other situations like 1xx handling, ssl-bump or tunnels which would benefit from these checks - I have not looked into those.

These checks should be done both after parse of new I/O AND before resuming a deferred/pipeline request

This PR fixes the former. If somebody investigates, adjusts, and improves the complex latter cases, great! This PR does not need to be responsible for encapsulating context-sensitive code when there is no consensus regarding those future improvements or even about the benefit of such encapsulation itself. We can move on without it.

These checks should be done ... before resuming a deferred/pipeline request. Otherwise we may end up doing a full server transaction and trying to write it to a closed client, just because the half-closed monitor was slow.

I disagree on two levels:

These checks do nothing useful when "the half-closed monitor is slow": Adding them before resuming a deferred request will not terminate the job when the half-closed monitor has not started closing the connection yet.

We can check for client closure in dozens of places "just in case" the monitor has started closing the connection, but I am strongly against that strategy because it complicates code and leads to non-technical "is this the right spot for a yet another just-in-case check?" conflicts. When the closure signal reaches the job, the job should cleanup. If we decide to prioritize closure signal delivery or discovery, we can do that, but that work should still aim at centralized detection and cleanup rather than sprinkling code with "just in case" checks.

There may be other situations like 1xx handling, ssl-bump or tunnels which would benefit from these checks - I have not looked into those

I have not looked either, but have doubts about the whole approach as discussed above. When somebody does look and discovers the genuine need for this context-sensitive code to be reused, they will be able to encapsulate/move it according to their needs.

rousskov

@yadij, I hope I have addressed all your change requests. Please re-review.

rousskov · 2023-08-06T13:42:09Z

src/client_side.cc

-    if (clientParseRequests()) {
-        debugs(33, 3, clientConnection << ": parsed next request from buffer");
-    }
+    parseRequests();


I am against calling afterClientRead(), documented as "processing to be done after a Comm::Read()", after we did not read from the client.

I am sure this code needs to be refactored further, but such refactoring can and should be done in dedicated PR(s).

src/client_side.cc

rousskov · 2023-08-06T14:06:59Z

src/client_side.cc

+/// Attempt to parse one or more requests from the input buffer.
+/// May close the connection.
+void
+ConnStateData::parseRequests()


Can we please take advantage of this re-naming to replace the word "Requests" with "Messages". That will better fit the modern protocols which can send non-request messages (eg PROXYP header).

We should not replace Requests with Messages here: This class deals with both requests and responses, each message kind going in both directions. The word "request" orients developers much better than the word "message" would.

Some developers may even object to calling PROXY protocol header bytes a "message", but I bet that most developers would have no trouble with applying the word "request" to bytes like PROXY protocol header. "Request" is a very useful high-level abstraction here. "Message" is less useful.

rousskov · 2023-08-06T14:11:51Z

src/client_side.cc


-    /* XXX where to 'finish' the parsing pass? */
-    return parsed_req;
+    debugs(33, 7, "parsed request(s): " << parsed_req << "; buffered leftovers: " << inBuf.length());


IMO the parser output at higher levels makes it unnecessary that requests were parsed. Please drop parsed_req variable entirely.

When navigating large/complex logs, this variable simplifies detecting cases where some requests were parsed and cases when no requests were parsed. If you insist, I will remove it, but it is useful. Perhaps you would be more comfortable with a conversion of that boolean variable into a counter? That would be even more useful IMO. I can volunteer to do that conversion in a followup PR.

src/client_side.cc

rousskov · 2023-08-06T14:19:53Z

src/client_side.cc

+    if (isOpen() && commIsHalfClosed(clientConnection->fd)) {
+        if (pipeline.empty()) {
+            // we processed what we could parse, and no more data is coming
+            debugs(33, 5, "closing half-closed without parsed requests: " << clientConnection);
+            clientConnection->close();
+        } else {
+            // we parsed what we could, and no more data is coming
+            debugs(33, 5, "monitoring half-closed while processing parsed requests: " << clientConnection);
+            flags.readMore = false; // may already be false
+        }
+    }


This new/moved code is not related to parsing. Please move this block into a separate method

I disagree. This code is closely related to parsing because it would not work correctly in any context except after the parsing loop above it -- it relies heavily on certain parsing state. Moreover, this code must be executed after that parsing loop -- forgetting to do that would (re)introduce bugs. This code essentially finishes parsing or establishes correct job state after low-level parsing is done. I do not recommend moving it out, but I will encapsulate if you insist. Do you insist on that encapsulation?

Context: squid-cache#1443 (comment)

rousskov · 2023-09-12T18:19:20Z

@yadij, please note that this PR has been waiting for your review for more than 20 days. Under the "Timeouts for stalled PRs" experiment, it may be force-merged in 10 days.

This bug is specific to "half_closed_clients on" configurations. assertion failed: ... "isOpen(fd) && !commHasHalfClosedMonitor(fd)" location: comm.cc:1583 in commStartHalfClosedMonitor() Squid asserts because Server schedules comm_read() after receiving EOF: That extra read results in another EOF notification, and an attempt to start monitoring an already monitored half-closed connection. Upon detecting a potentially half-closed connection, Server::doClientRead() should clear flags.readMore to prevent Server from scheduling another comm_read(), but it does not and cannot do that (without significant refactoring) because * Server does not have access to flags.readMore * flags.readMore hack is used for more than just "read more" We worked around the above limitation by re-detecting half-closed conditions and clearing flags.readMore after clientParseRequests(). That fixed the bug but further increased poor code duplication across ConnStateData::afterClientRead() and ConnStateData::kick() methods. We then refactored by merging and moving that duplicated code into clientParseRequests() and renamed that method to make backports safer.

rousskov added 2 commits August 4, 2023 11:08

rousskov mentioned this pull request Aug 4, 2023

Bug 4156: Work around commStartHalfClosedMonitor() assertion #1442

Closed

rousskov commented Aug 4, 2023

View reviewed changes

yadij requested changes Aug 6, 2023

View reviewed changes

rousskov commented Aug 6, 2023

View reviewed changes

rousskov requested a review from yadij August 6, 2023 14:22

rousskov added the S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box label Aug 6, 2023

fixup: Removed a barely used variable

4bccb77

Context: squid-cache#1443 (comment)

rousskov requested review from yadij and removed request for yadij August 6, 2023 21:58

yadij approved these changes Sep 12, 2023

View reviewed changes

yadij added M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box labels Sep 12, 2023

squid-anubis added the M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels label Sep 12, 2023

squid-anubis closed this Sep 12, 2023

Conversation

rousskov commented Aug 4, 2023

Uh oh!

rousskov commented Aug 4, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

uhliarik commented Aug 4, 2023

Uh oh!

uhliarik commented Aug 4, 2023

Uh oh!

rousskov commented Aug 4, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rousskov Aug 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yadij Aug 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rousskov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rousskov commented Sep 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rousskov Aug 6, 2023 •

edited

Loading

yadij Aug 6, 2023 •

edited

Loading