Skip to content

Bug 4718: Support filling raw buffer space of shared SBufs#64

Merged
rousskov merged 1 commit intosquid-cache:masterfrom
measurement-factory:bug4718
Oct 6, 2017
Merged

Bug 4718: Support filling raw buffer space of shared SBufs#64
rousskov merged 1 commit intosquid-cache:masterfrom
measurement-factory:bug4718

Conversation

@chtsanti
Copy link
Contributor

@chtsanti chtsanti commented Sep 25, 2017

SBuf::forceSize() requires exclusive SBuf ownership but its precursor
SBuf::rawSpace() method does not guarantee exclusivity. The pair of
calls may result in SBuf::forceSize() throwing for no good reason.

New SBuf API provides a new pair of raw buffer appending calls that
reduces the number of false negatives.

This change may alleviate bug 4718 symptoms but does not address its
core problem (which is still unconfirmed).

This is a Measurement Factory project.

Copy link
Contributor

@yadij yadij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just polish, so approving now.

src/sbuf/SBuf.h Outdated
* The returned pointer must not be stored by the caller, as it will
* be invalidated by the first call to a non-const method call
* on the SBuf.
* This call guarantees to never return NULL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make this say "never return a nullptr" instead of NULL

@yadij yadij added the S-waiting-for-author author action is expected (and usually required) label Sep 28, 2017
Squid may abort the connection with a TLS server while read server hello
message due to Must clause failure.

Inside Ssl::ServerBio::readAndBuffer, the SBuf::rawSpace/SBuf::forceSize
is used to read directly into an SBuf buffer network data instead of using
temporary char[] buffer and append it to SBuf.
The problem is that the SBuf::rawSpace/forceSize require only one user
(the Ssl::ServerBio object) of the content of SBuf object.

This patch fixes SBuf API and replaces the SBuf::rawSpace/SBuf::forceSize pair
with a new SBuf::rawAppendStart/SBuf::rawAppendFinish calls.
The SBuf::rawSpace become a private member and the SBuf::forceSize removed.

This is a Measurement Factory project
@rousskov
Copy link
Contributor

rousskov commented Oct 5, 2017

@eduard-bagdasaryan, I know I can probably resolve this problem with a manual "rebuild" comment/command, but do you know why Jenkins got stuck and has no Details link next to its "Build triggered. sha1 is merged" status line?

@eduard-bagdasaryan
Copy link
Contributor

Jenkins got stuck because one of its builds lasted > 2 days. Other build requests were queued, including this PR. I have no good idea why PR 30 got stuck on Jenkins: all nodes except "d-fedora-26" completed with success. To avoid such hanging in future, I would suggest to limit build time, specifying a suitable timeout. This could be done using the corresponding plugin.

@rousskov
Copy link
Contributor

rousskov commented Oct 5, 2017

Jenkins got stuck because one of its builds lasted > 2 days. Other build requests were queued, including this PR.

I understand the second sentence/statement, but the first one can be (mis)interpreted in many ways. I will rephrase the original question given the additional information you have provided: Why was not this previously queued PR tested after PR #30 got manually aborted? Is this PR still in the Jenkins queue, waiting for other PRs to be tested?

If Jenkins auto-kills tasks (or whatever the correct term is) that wait in queue for 2 days, then can Jenkins update their status on GitHub accordingly?

And a separate question that I tried to ask earlier: Why does this PR has no Details link next to its "Build triggered. sha1 is merged" status line? Is it possible for Jenkins to provide a link to queued but not yet tested tasks (or whatever the correct term is)?

@rousskov rousskov changed the title Squid ssl-bump parser crash Support filling raw buffer space of shared SBufs Oct 5, 2017
@eduard-bagdasaryan
Copy link
Contributor

is this PR still in the Jenkins queue, waiting for other PRs to be tested?

yes, you can see the queue at https://build.squid-cache.org/job/5-pr-test/.

If Jenkins auto-kills tasks (or whatever the correct term is) that wait in queue for 2 days, then can Jenkins update their status on GitHub accordingly?

AFAIK, this PR 30 was aborted manually by Amos and Jenkins PR builder was unable to update Github PR status. That is why I suggested to add a plugin to be able to abort stuck jobs within pre-set timeout. I'll try to check whether PR status will be updated in this case.

Why does this PR has no Details link next to its "Build triggered. sha1 is merged" status line?

Because PR builder has not yet started the build, thus updating Github status.

Is it possible for Jenkins to provide a link to queued but not yet tested tasks (or whatever the correct term is)?

Not possible with current configuration. I don't know whether there are other plugins for that.

@rousskov
Copy link
Contributor

rousskov commented Oct 5, 2017

@chtsanti, please review my changes to the PR title/description and adjust as needed.

@rousskov
Copy link
Contributor

rousskov commented Oct 5, 2017

@eduard-bagdasaryan, I understand what is happening now. Thank you.

you can see the queue at https://build.squid-cache.org/job/5-pr-test/.

I can see the queue at the above magical URL, but I cannot (easily) tell whether this PR is in that queue:

#1​02  		(pending—Build #99 is already in progress (ETA:4 min 0 sec))
#1​01  		(pending—Build #99 is already in progress (ETA:4 min 0 sec))
#1​00  		(pending—Build #99 is already in progress (ETA:4 min 0 sec))
#9​9 Oct 5, 2017 11:53 AM PR #51: ​Move ​TLS/SSL ​http_port ​conf... |

Please note that I am not saying you should fix this "users do not know what is going on" problem! I am just documenting the (evidently non-obvious) state of a user mind in case you (or somebody else reading this) happen to adjust Jenkins configuration later (for other reasons) and happen to know how to improve this.

I suggested to add a plugin to be able to abort stuck jobs within pre-set timeout.

I agree that adding a busy test timeout is the right thing to do (but you are not a Jenkins maintainer so I am not suggesting that you drop everything and implement it!).

[Providing a link to queued but not yet tested tasks (or whatever the correct term is)] is not possible with current configuration. I don't know whether there are other plugins for that.

Noted. Is it possible to change the status text ("Build triggered. sha1 is merged")?

@kinkie
Copy link
Contributor

kinkie commented Oct 5, 2017 via email

@kinkie
Copy link
Contributor

kinkie commented Oct 5, 2017 via email

@eduard-bagdasaryan
Copy link
Contributor

eduard-bagdasaryan commented Oct 5, 2017

Is it possible to change the status text ("Build triggered. sha1 is merged")?

It is possible in Jenkins job's ghprb trigger setup. Except "build triggering" message, there are other configurable status messages which can/should be configured for our Jenkins job:

  1. Build triggered
  2. Build started
  3. Build success
  4. Build error
  5. Build failure

@rousskov
Copy link
Contributor

rousskov commented Oct 5, 2017

@eduard-bagdasaryan, noted, thank you. AFAICT after looking at the plugin code, we should not change the "build triggered" status to "build queued" or similar because the code setting this status does not know what will happen to the build. It may not actually be queued. The problem is in the queuing code (which does not set the status at all) and not the onTrigger code (which uses unfortunate terminology that plugin-unaware users may not understand but does not actually lie).

I recommend keeping the default status codes for now: We may be able to improve the default wording but changing the wording will not address the core problem (i.e., unknown task status) and will complicate finding the relevant plugin code/documentation.

@yadij
Copy link
Contributor

yadij commented Oct 6, 2017

FTR: I killed the Fedora Jenkins sub-jobs from the PR matrix after they had already been stuck/hung for 24hrs going by the console output. The matrix job itself should have go and reached a non-success result.

Also, some data to keep in mind when considering timeouts; the matrix which hung itself was delayed by earlier Jobs taking up the first ~12hrs before any of its tests even started, and the ones that followed were delayed by over 48hrs. Delays like this are common with Jenkins jobs.

Copy link
Contributor

@yadij yadij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the Jenkins discussions aside, this PR looks good to me now.

Please apply and make sure the commit subject starts with "Bug 4718:" - even if it is not sufficient to close the bug with certainty it is part of the fix and our release automation needs the link.

@rousskov
Copy link
Contributor

rousskov commented Oct 6, 2017

@yadij, thank you for cleaning the queue, and yes, any test timeouts should be based on runtime (i.e., excluding any queuing time).

@yadij yadij added S-waiting-for-committer privileged action is expected (and usually required) and removed S-waiting-for-author author action is expected (and usually required) labels Oct 6, 2017
@rousskov rousskov changed the title Support filling raw buffer space of shared SBufs Bug 4718: Support filling raw buffer space of shared SBufs Oct 6, 2017
@rousskov rousskov merged commit 672337d into squid-cache:master Oct 6, 2017
@rousskov rousskov deleted the bug4718 branch October 6, 2017 04:20
@rousskov rousskov removed the S-waiting-for-committer privileged action is expected (and usually required) label Oct 6, 2017
@rousskov rousskov added S-waiting-for-maintainer maintainer action is expected (and usually required) and removed S-waiting-for-maintainer maintainer action is expected (and usually required) labels Nov 26, 2017
squidadm pushed a commit to squidadm/squid that referenced this pull request Nov 27, 2017
…he#64)

SBuf::forceSize() requires exclusive SBuf ownership but its precursor
SBuf::rawSpace() method does not guarantee exclusivity. The pair of
calls may result in SBuf::forceSize() throwing for no good reason.

New SBuf API provides a new pair of raw buffer appending calls that
reduces the number of false negatives.

This change may alleviate bug 4718 symptoms but does not address its
core problem (which is still unconfirmed).

This is a Measurement Factory project.
yadij pushed a commit that referenced this pull request Nov 27, 2017
SBuf::forceSize() requires exclusive SBuf ownership but its precursor
SBuf::rawSpace() method does not guarantee exclusivity. The pair of
calls may result in SBuf::forceSize() throwing for no good reason.

New SBuf API provides a new pair of raw buffer appending calls that
reduces the number of false negatives.

This change may alleviate bug 4718 symptoms but does not address its
core problem (which is still unconfirmed).

This is a Measurement Factory project.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants