Skip to content

Bug 4156: comm.cc "!commHasHalfClosedMonitor(fd)" assertion#1443

Closed
rousskov wants to merge 3 commits intosquid-cache:masterfrom
measurement-factory:SQUID-111-bug4156-do-no-read-from-half-closed
Closed

Bug 4156: comm.cc "!commHasHalfClosedMonitor(fd)" assertion#1443
rousskov wants to merge 3 commits intosquid-cache:masterfrom
measurement-factory:SQUID-111-bug4156-do-no-read-from-half-closed

Conversation

@rousskov
Copy link
Contributor

@rousskov rousskov commented Aug 4, 2023

This bug is specific to "half_closed_clients on" configurations.

assertion failed: ... "isOpen(fd) && !commHasHalfClosedMonitor(fd)"
location: comm.cc:1583 in commStartHalfClosedMonitor()

Squid asserts because Server schedules comm_read() after receiving EOF:
That extra read results in another EOF notification, and an attempt to
start monitoring an already monitored half-closed connection.

Upon detecting a potentially half-closed connection,
Server::doClientRead() should clear flags.readMore to prevent Server
from scheduling another comm_read(), but it does not and cannot do that
(without significant refactoring) because

  • Server does not have access to flags.readMore
  • flags.readMore hack is used for more than just "read more"

We worked around the above limitation by re-detecting half-closed
conditions and clearing flags.readMore after clientParseRequests(). That
fixed the bug but further increased poor code duplication across
ConnStateData::afterClientRead() and ConnStateData::kick() methods. We
then refactored by merging and moving that duplicated code into
clientParseRequests() and renamed that method to make backports safer.

This bug is specific to "half_closed_clients on" configurations.

    assertion failed: ... "isOpen(fd) && !commHasHalfClosedMonitor(fd)"
    location: comm.cc:1583 in commStartHalfClosedMonitor()

Squid asserts because Server schedules comm_read() after receiving EOF:
That extra read results in another EOF notification, and an attempt to
start monitoring an already monitored half-closed connection.

Upon detecting a potentially half-closed connection,
Server::doClientRead() should clear flags.readMore to prevent Server
from scheduling another comm_read(), but it does not and cannot do that
(without significant refactoring) because

* Server does not have access to flags.readMore
* flags.readMore hack is used for more than just "read more"

We work around the above limitation by re-detecting half-closed
conditions and clearing flags.readMore after clientParseRequests().
... and another old related XXX.

Also removed a stale "where to 'finish' the parsing pass?" XXX. It was
added in 2006 commit a5baffb, apparently to indicate the lack of parser
cleanup code corresponding to the then-added HttpParserInit() call.
Since our request parsing code is (or at least pretends to be)
incremental, the lack of that cleanup is not really an XXX anymore, and
the corresponding parser reset logic is documented in
Http::One::Server::parseOneRequest().

I renamed clientParseRequests() to reduce chances that some old callers
will mishandle closed connections when porting this branch changes.
@rousskov
Copy link
Contributor Author

rousskov commented Aug 4, 2023

@uhliarik, if you can test this fix, please do. If you have trouble applying the changes to your Squid code, try applying just the first branch commit changes.

} else {
// we parsed what we could, and no more data is coming
debugs(33, 5, "monitoring half-closed while processing parsed requests: " << clientConnection);
flags.readMore = false; // may already be false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new assignment is the core of this bug fix. If you want to see that change in isolation from the refactoring "noise", see the first branch commit: c6b2ae6.

}
}

/* XXX where to 'finish' the parsing pass? */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this stale XXX. It was added in 2006 commit a5baffb, apparently to indicate the lack of parser cleanup code corresponding to the then-added HttpParserInit() call. Since our request parsing code is (or at least pretends to be) incremental, the lack of that cleanup is not really an XXX anymore, and the corresponding parser reset logic is documented in Http::One::Server::parseOneRequest().

@uhliarik
Copy link
Contributor

uhliarik commented Aug 4, 2023

Thanks Alex. I applied the first commit to my Squid 6.1 code and I'm waiting now for the build. I will let you know ASAP if squid is still crashing after the fix or not.

@uhliarik
Copy link
Contributor

uhliarik commented Aug 4, 2023

Alex, I can confirm that squid with the first commit applied (the one without all the refactorization) is not crashing anymore. Thanks that you looked into it! :)

@rousskov
Copy link
Contributor Author

rousskov commented Aug 4, 2023

Lubos, I appreciate the update! Just FYI: The first branch commit should be safe to use in isolation, but it does not cover one of the code paths that might lead to double-monitoring assertions. The second commit does. Thus, if this fix is merged, I would recommend backporting the whole thing rather than just the first commit, but that decision would be up to maintainers, of course.

/// Attempt to parse one or more requests from the input buffer.
/// May close the connection.
void
ConnStateData::parseRequests()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please take advantage of this re-naming to replace the word "Requests" with "Messages". That will better fit the modern protocols which can send non-request messages (eg PROXYP header).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please take advantage of this re-naming to replace the word "Requests" with "Messages". That will better fit the modern protocols which can send non-request messages (eg PROXYP header).

We should not replace Requests with Messages here: This class deals with both requests and responses, each message kind going in both directions. The word "request" orients developers much better than the word "message" would.

Some developers may even object to calling PROXY protocol header bytes a "message", but I bet that most developers would have no trouble with applying the word "request" to bytes like PROXY protocol header. "Request" is a very useful high-level abstraction here. "Message" is less useful.


/* XXX where to 'finish' the parsing pass? */
return parsed_req;
debugs(33, 7, "parsed request(s): " << parsed_req << "; buffered leftovers: " << inBuf.length());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the parser output at higher levels makes it unnecessary that requests were parsed. Please drop parsed_req variable entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the parser output at higher levels makes it unnecessary that requests were parsed. Please drop parsed_req variable entirely.

When navigating large/complex logs, this variable simplifies detecting cases where some requests were parsed and cases when no requests were parsed. If you insist, I will remove it, but it is useful. Perhaps you would be more comfortable with a conversion of that boolean variable into a counter? That would be even more useful IMO. I can volunteer to do that conversion in a followup PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do really think it is just useless info. The earlier debugs(33, 3, ...) log entries inside the while-loop are sufficient to show whether a request was parsed, or why not.
Making this line just report that there is buffer leftovers (or not) should be enough and simplifies the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do really think it is just useless info. The earlier debugs(33, 3, ...) log entries inside the while-loop are sufficient to show whether a request was parsed, or why not. Making this line just report that there is buffer leftovers (or not) should be enough and simplifies the code.

Those earlier debugs() do not cover all cases. Moreover, that "earlier code" may produce thousands of log lines that make summarizing what happened difficult for a human. However, I removed that field in hope to unblock this PR: commit 4bccb77.

if (clientParseRequests()) {
debugs(33, 3, clientConnection << ": parsed next request from buffer");
}
parseRequests();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this parser call was/is wrong. Instead of parsing here the below call to readNextRequest() should be a call to afterClientRead().

GH does not let me mark that if-statement for change. It should be this:

    if (Http::StreamPointer deferredRequest = pipeline.front()) {
        debugs(33, 3, clientConnection << ": calling PushDeferredIfNeeded");
        ClientSocketContextPushDeferredIfNeeded(deferredRequest, this);
    } else {
        afterClientRead();
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am against calling afterClientRead(), documented as "processing to be done after a Comm::Read()", after we did not read from the client.

I am sure this code needs to be refactored further, but such refactoring can and should be done in dedicated PR(s).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting you to understand that calling afterClientRead() is the minimal change appropriate to include here. Just switching the kick() logic to remove the bad logic duplication of the parse+HalfClosed logic you are altering in this PR.

You can add a TODO about refactoring afterClientRead() and getting a better name for it if you like.

Copy link
Contributor Author

@rousskov rousskov Aug 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting you to understand that ...

I hope you will find a better way to phrase that kind of thought next time. FWIW, to reduce insults, I strive to focus on facts and my understanding of them while avoiding talking about (my expectations of) others' understanding.

calling afterClientRead() is the minimal change appropriate to include here.

IMO, that change is neither minimal nor appropriate here. It is fairly complex, unnecessary in this PR, and probably wrong (for the reasons stated earlier and for the reasons discussed below, at least).

    if (Http::StreamPointer deferredRequest = pipeline.front()) {
        debugs(33, 3, clientConnection << ": calling PushDeferredIfNeeded");
        ClientSocketContextPushDeferredIfNeeded(deferredRequest, this);
    } else {
        afterClientRead();
    }

Just switching the kick() logic to remove the bad logic duplication of the parse+HalfClosed logic you are altering in this PR.

Making request parsing in kick() conditional on pipeline being empty is outside this PR scope and is probably wrong: In some cases, official kick() code parses requests when pipeline has at least one request queued already. That approach makes sense to me -- when certain conditions are met, a new request may be successfully parsed while the previous request is still in the queue. We have concurrentRequestQueueFilled() checks that are responsible for making those complex decisions. Poorly duplicating those checks in kick() is not an improvement. This PR leaves that controversial/complex/messy/buggy code intact, making a small step forward.

Comment on lines +1936 to +1946
if (isOpen() && commIsHalfClosed(clientConnection->fd)) {
if (pipeline.empty()) {
// we processed what we could parse, and no more data is coming
debugs(33, 5, "closing half-closed without parsed requests: " << clientConnection);
clientConnection->close();
} else {
// we parsed what we could, and no more data is coming
debugs(33, 5, "monitoring half-closed while processing parsed requests: " << clientConnection);
flags.readMore = false; // may already be false
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new/moved code is not related to parsing. Please move this block into a separate method and into the caller. After the change to kick() that will be the only caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new/moved code is not related to parsing. Please move this block into a separate method

I disagree. This code is closely related to parsing because it would not work correctly in any context except after the parsing loop above it -- it relies heavily on certain parsing state. Moreover, this code must be executed after that parsing loop -- forgetting to do that would (re)introduce bugs. This code essentially finishes parsing or establishes correct job state after low-level parsing is done. I do not recommend moving it out, but I will encapsulate if you insist. Do you insist on that encapsulation?

Copy link
Contributor

@yadij yadij Aug 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is relevant with the change to kick(). These checks should be done both after parse of new I/O AND before resuming a deferred/pipeline request. Otherwise we may end up doing a full server transaction and trying to write it to a closed client, just because the half-closed monitor was slow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be other situations like 1xx handling, ssl-bump or tunnels which would benefit from these checks - I have not looked into those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks should be done both after parse of new I/O AND before resuming a deferred/pipeline request

This PR fixes the former. If somebody investigates, adjusts, and improves the complex latter cases, great! This PR does not need to be responsible for encapsulating context-sensitive code when there is no consensus regarding those future improvements or even about the benefit of such encapsulation itself. We can move on without it.

These checks should be done ... before resuming a deferred/pipeline request. Otherwise we may end up doing a full server transaction and trying to write it to a closed client, just because the half-closed monitor was slow.

I disagree on two levels:

  • These checks do nothing useful when "the half-closed monitor is slow": Adding them before resuming a deferred request will not terminate the job when the half-closed monitor has not started closing the connection yet.
  • We can check for client closure in dozens of places "just in case" the monitor has started closing the connection, but I am strongly against that strategy because it complicates code and leads to non-technical "is this the right spot for a yet another just-in-case check?" conflicts. When the closure signal reaches the job, the job should cleanup. If we decide to prioritize closure signal delivery or discovery, we can do that, but that work should still aim at centralized detection and cleanup rather than sprinkling code with "just in case" checks.

There may be other situations like 1xx handling, ssl-bump or tunnels which would benefit from these checks - I have not looked into those

I have not looked either, but have doubts about the whole approach as discussed above. When somebody does look and discovers the genuine need for this context-sensitive code to be reused, they will be able to encapsulate/move it according to their needs.

Copy link
Contributor Author

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yadij, I hope I have addressed all your change requests. Please re-review.

if (clientParseRequests()) {
debugs(33, 3, clientConnection << ": parsed next request from buffer");
}
parseRequests();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am against calling afterClientRead(), documented as "processing to be done after a Comm::Read()", after we did not read from the client.

I am sure this code needs to be refactored further, but such refactoring can and should be done in dedicated PR(s).

/// Attempt to parse one or more requests from the input buffer.
/// May close the connection.
void
ConnStateData::parseRequests()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please take advantage of this re-naming to replace the word "Requests" with "Messages". That will better fit the modern protocols which can send non-request messages (eg PROXYP header).

We should not replace Requests with Messages here: This class deals with both requests and responses, each message kind going in both directions. The word "request" orients developers much better than the word "message" would.

Some developers may even object to calling PROXY protocol header bytes a "message", but I bet that most developers would have no trouble with applying the word "request" to bytes like PROXY protocol header. "Request" is a very useful high-level abstraction here. "Message" is less useful.


/* XXX where to 'finish' the parsing pass? */
return parsed_req;
debugs(33, 7, "parsed request(s): " << parsed_req << "; buffered leftovers: " << inBuf.length());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the parser output at higher levels makes it unnecessary that requests were parsed. Please drop parsed_req variable entirely.

When navigating large/complex logs, this variable simplifies detecting cases where some requests were parsed and cases when no requests were parsed. If you insist, I will remove it, but it is useful. Perhaps you would be more comfortable with a conversion of that boolean variable into a counter? That would be even more useful IMO. I can volunteer to do that conversion in a followup PR.

Comment on lines +1936 to +1946
if (isOpen() && commIsHalfClosed(clientConnection->fd)) {
if (pipeline.empty()) {
// we processed what we could parse, and no more data is coming
debugs(33, 5, "closing half-closed without parsed requests: " << clientConnection);
clientConnection->close();
} else {
// we parsed what we could, and no more data is coming
debugs(33, 5, "monitoring half-closed while processing parsed requests: " << clientConnection);
flags.readMore = false; // may already be false
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new/moved code is not related to parsing. Please move this block into a separate method

I disagree. This code is closely related to parsing because it would not work correctly in any context except after the parsing loop above it -- it relies heavily on certain parsing state. Moreover, this code must be executed after that parsing loop -- forgetting to do that would (re)introduce bugs. This code essentially finishes parsing or establishes correct job state after low-level parsing is done. I do not recommend moving it out, but I will encapsulate if you insist. Do you insist on that encapsulation?

@rousskov rousskov requested a review from yadij August 6, 2023 14:22
@rousskov rousskov added the S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box label Aug 6, 2023
@rousskov rousskov requested review from yadij and removed request for yadij August 6, 2023 21:58
@rousskov
Copy link
Contributor Author

@yadij, please note that this PR has been waiting for your review for more than 20 days. Under the "Timeouts for stalled PRs" experiment, it may be force-merged in 10 days.

@yadij yadij added M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box labels Sep 12, 2023
squid-anubis pushed a commit that referenced this pull request Sep 12, 2023
This bug is specific to "half_closed_clients on" configurations.

    assertion failed: ... "isOpen(fd) && !commHasHalfClosedMonitor(fd)"
    location: comm.cc:1583 in commStartHalfClosedMonitor()

Squid asserts because Server schedules comm_read() after receiving EOF:
That extra read results in another EOF notification, and an attempt to
start monitoring an already monitored half-closed connection.

Upon detecting a potentially half-closed connection,
Server::doClientRead() should clear flags.readMore to prevent Server
from scheduling another comm_read(), but it does not and cannot do that
(without significant refactoring) because

* Server does not have access to flags.readMore
* flags.readMore hack is used for more than just "read more"

We worked around the above limitation by re-detecting half-closed
conditions and clearing flags.readMore after clientParseRequests(). That
fixed the bug but further increased poor code duplication across
ConnStateData::afterClientRead() and ConnStateData::kick() methods. We
then refactored by merging and moving that duplicated code into
clientParseRequests() and renamed that method to make backports safer.
@squid-anubis squid-anubis added the M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels label Sep 12, 2023
@squid-anubis squid-anubis added M-merged https://github.com/measurement-factory/anubis#pull-request-labels and removed M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels labels Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

M-merged https://github.com/measurement-factory/anubis#pull-request-labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants