Bug 4718: Support filling raw buffer space of shared SBufs#64
Bug 4718: Support filling raw buffer space of shared SBufs#64rousskov merged 1 commit intosquid-cache:masterfrom
Conversation
yadij
left a comment
There was a problem hiding this comment.
just polish, so approving now.
src/sbuf/SBuf.h
Outdated
| * The returned pointer must not be stored by the caller, as it will | ||
| * be invalidated by the first call to a non-const method call | ||
| * on the SBuf. | ||
| * This call guarantees to never return NULL. |
There was a problem hiding this comment.
please make this say "never return a nullptr" instead of NULL
Squid may abort the connection with a TLS server while read server hello message due to Must clause failure. Inside Ssl::ServerBio::readAndBuffer, the SBuf::rawSpace/SBuf::forceSize is used to read directly into an SBuf buffer network data instead of using temporary char[] buffer and append it to SBuf. The problem is that the SBuf::rawSpace/forceSize require only one user (the Ssl::ServerBio object) of the content of SBuf object. This patch fixes SBuf API and replaces the SBuf::rawSpace/SBuf::forceSize pair with a new SBuf::rawAppendStart/SBuf::rawAppendFinish calls. The SBuf::rawSpace become a private member and the SBuf::forceSize removed. This is a Measurement Factory project
|
@eduard-bagdasaryan, I know I can probably resolve this problem with a manual "rebuild" comment/command, but do you know why Jenkins got stuck and has no Details link next to its "Build triggered. sha1 is merged" status line? |
|
Jenkins got stuck because one of its builds lasted > 2 days. Other build requests were queued, including this PR. I have no good idea why PR 30 got stuck on Jenkins: all nodes except "d-fedora-26" completed with success. To avoid such hanging in future, I would suggest to limit build time, specifying a suitable timeout. This could be done using the corresponding plugin. |
I understand the second sentence/statement, but the first one can be (mis)interpreted in many ways. I will rephrase the original question given the additional information you have provided: Why was not this previously queued PR tested after PR #30 got manually aborted? Is this PR still in the Jenkins queue, waiting for other PRs to be tested? If Jenkins auto-kills tasks (or whatever the correct term is) that wait in queue for 2 days, then can Jenkins update their status on GitHub accordingly? And a separate question that I tried to ask earlier: Why does this PR has no Details link next to its "Build triggered. sha1 is merged" status line? Is it possible for Jenkins to provide a link to queued but not yet tested tasks (or whatever the correct term is)? |
yes, you can see the queue at https://build.squid-cache.org/job/5-pr-test/.
AFAIK, this PR 30 was aborted manually by Amos and Jenkins PR builder was unable to update Github PR status. That is why I suggested to add a plugin to be able to abort stuck jobs within pre-set timeout. I'll try to check whether PR status will be updated in this case.
Because PR builder has not yet started the build, thus updating Github status.
Not possible with current configuration. I don't know whether there are other plugins for that. |
|
@chtsanti, please review my changes to the PR title/description and adjust as needed. |
|
@eduard-bagdasaryan, I understand what is happening now. Thank you.
I can see the queue at the above magical URL, but I cannot (easily) tell whether this PR is in that queue: Please note that I am not saying you should fix this "users do not know what is going on" problem! I am just documenting the (evidently non-obvious) state of a user mind in case you (or somebody else reading this) happen to adjust Jenkins configuration later (for other reasons) and happen to know how to improve this.
I agree that adding a busy test timeout is the right thing to do (but you are not a Jenkins maintainer so I am not suggesting that you drop everything and implement it!).
Noted. Is it possible to change the status text ("Build triggered. sha1 is merged")? |
|
It seems to be stuck on building xenial:
https://build.squid-cache.org/job/5-pr-test/compiler=clang,label=d-ubuntu-xenial/99/console
…On Thu, Oct 5, 2017 at 4:03 PM, Alex Rousskov ***@***.***> wrote:
@eduard-bagdasaryan <https://github.com/eduard-bagdasaryan>, I understand
what is happening now. Thank you.
you can see the queue at https://build.squid-cache.org/job/5-pr-test/.
I can see the queue at the above magical URL, but I cannot (easily) tell
whether *this PR* is in that queue:
#102 (pending—Build #99 is already in progress (ETA:4 min 0 sec))
#101 (pending—Build #99 is already in progress (ETA:4 min 0 sec))
#100 (pending—Build #99 is already in progress (ETA:4 min 0 sec))
#99 Oct 5, 2017 11:53 AM PR #51: Move TLS/SSL http_port conf... |
Please note that I am not saying you should fix this "users do not know
what is going on" problem! I am just documenting the (evidently
non-obvious) state of a user mind in case you (or somebody else reading
this) happen to adjust Jenkins configuration later (for other reasons) and
happen to know how to improve this.
I suggested to add a plugin to be able to abort stuck jobs within pre-set
timeout.
I agree that adding a busy test timeout is the right thing to do (but you
are not a Jenkins maintainer so I am not suggesting that you drop
everything and implement it!).
[Providing a link to queued but not yet tested tasks (or whatever the
correct term is)] is not possible with current configuration. I don't know
whether there are other plugins for that.
Noted. Is it possible to change the status text ("Build triggered. sha1 is
merged")?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#64 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AE76jMRuCb2jAra6W5T16Sb5hS_zBR2bks5spO-hgaJpZM4PjA4M>
.
--
Francesco
|
|
I've killed job #99, corresponding to PR51; it's interesting to see how
will github interpret it and how to relaunch it.
…On Thu, Oct 5, 2017 at 4:04 PM, Kinkie ***@***.***> wrote:
It seems to be stuck on building xenial: https://build.squid-
cache.org/job/5-pr-test/compiler=clang,label=d-ubuntu-xenial/99/console
On Thu, Oct 5, 2017 at 4:03 PM, Alex Rousskov ***@***.***>
wrote:
> @eduard-bagdasaryan <https://github.com/eduard-bagdasaryan>, I
> understand what is happening now. Thank you.
>
> you can see the queue at https://build.squid-cache.org/job/5-pr-test/.
>
> I can see the queue at the above magical URL, but I cannot (easily) tell
> whether *this PR* is in that queue:
>
> #102 (pending—Build #99 is already in progress (ETA:4 min 0 sec))
> #101 (pending—Build #99 is already in progress (ETA:4 min 0 sec))
> #100 (pending—Build #99 is already in progress (ETA:4 min 0 sec))
> #99 Oct 5, 2017 11:53 AM PR #51: Move TLS/SSL http_port conf... |
>
> Please note that I am not saying you should fix this "users do not know
> what is going on" problem! I am just documenting the (evidently
> non-obvious) state of a user mind in case you (or somebody else reading
> this) happen to adjust Jenkins configuration later (for other reasons) and
> happen to know how to improve this.
>
> I suggested to add a plugin to be able to abort stuck jobs within pre-set
> timeout.
>
> I agree that adding a busy test timeout is the right thing to do (but you
> are not a Jenkins maintainer so I am not suggesting that you drop
> everything and implement it!).
>
> [Providing a link to queued but not yet tested tasks (or whatever the
> correct term is)] is not possible with current configuration. I don't know
> whether there are other plugins for that.
>
> Noted. Is it possible to change the status text ("Build triggered. sha1
> is merged")?
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#64 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AE76jMRuCb2jAra6W5T16Sb5hS_zBR2bks5spO-hgaJpZM4PjA4M>
> .
>
--
Francesco
--
Francesco
|
It is possible in Jenkins job's ghprb trigger setup. Except "build triggering" message, there are other configurable status messages which can/should be configured for our Jenkins job:
|
|
@eduard-bagdasaryan, noted, thank you. AFAICT after looking at the plugin code, we should not change the "build triggered" status to "build queued" or similar because the code setting this status does not know what will happen to the build. It may not actually be queued. The problem is in the queuing code (which does not set the status at all) and not the onTrigger code (which uses unfortunate terminology that plugin-unaware users may not understand but does not actually lie). I recommend keeping the default status codes for now: We may be able to improve the default wording but changing the wording will not address the core problem (i.e., unknown task status) and will complicate finding the relevant plugin code/documentation. |
|
FTR: I killed the Fedora Jenkins sub-jobs from the PR matrix after they had already been stuck/hung for 24hrs going by the console output. The matrix job itself should have go and reached a non-success result. Also, some data to keep in mind when considering timeouts; the matrix which hung itself was delayed by earlier Jobs taking up the first ~12hrs before any of its tests even started, and the ones that followed were delayed by over 48hrs. Delays like this are common with Jenkins jobs. |
yadij
left a comment
There was a problem hiding this comment.
Putting the Jenkins discussions aside, this PR looks good to me now.
Please apply and make sure the commit subject starts with "Bug 4718:" - even if it is not sufficient to close the bug with certainty it is part of the fix and our release automation needs the link.
|
@yadij, thank you for cleaning the queue, and yes, any test timeouts should be based on runtime (i.e., excluding any queuing time). |
…he#64) SBuf::forceSize() requires exclusive SBuf ownership but its precursor SBuf::rawSpace() method does not guarantee exclusivity. The pair of calls may result in SBuf::forceSize() throwing for no good reason. New SBuf API provides a new pair of raw buffer appending calls that reduces the number of false negatives. This change may alleviate bug 4718 symptoms but does not address its core problem (which is still unconfirmed). This is a Measurement Factory project.
SBuf::forceSize() requires exclusive SBuf ownership but its precursor SBuf::rawSpace() method does not guarantee exclusivity. The pair of calls may result in SBuf::forceSize() throwing for no good reason. New SBuf API provides a new pair of raw buffer appending calls that reduces the number of false negatives. This change may alleviate bug 4718 symptoms but does not address its core problem (which is still unconfirmed). This is a Measurement Factory project.
SBuf::forceSize() requires exclusive SBuf ownership but its precursor
SBuf::rawSpace() method does not guarantee exclusivity. The pair of
calls may result in SBuf::forceSize() throwing for no good reason.
New SBuf API provides a new pair of raw buffer appending calls that
reduces the number of false negatives.
This change may alleviate bug 4718 symptoms but does not address its
core problem (which is still unconfirmed).
This is a Measurement Factory project.