Skip to content

[JENKINS-54126] if strange 404 when cache is on #204

Merged
rsandell merged 6 commits intomasterfrom
JENKINS-54126
Feb 20, 2019
Merged

[JENKINS-54126] if strange 404 when cache is on #204
rsandell merged 6 commits intomasterfrom
JENKINS-54126

Conversation

@rsandell
Copy link
Copy Markdown
Member

temporarily turn it off and try again

//JENKINS-54126 try again without cache headers
final Connector.ForceValidationOkHttpConnector oldConnector = (Connector.ForceValidationOkHttpConnector) gitHub.getConnector();
try {
//TODO I'm not sure we are alone in using this connector so maybe concurrent modification problems
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe have a withoutCache method that takes a closure, that way you could use a lock in the connector to ensure the correct one gets used... if you are worried about concurrent requests

Copy link
Copy Markdown
Member

@stephenc stephenc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're going to have to accept this kind of hacks until we find the root cause.

May want to have a system property to turn off if cutting a release

}
// means that does not exist and this is handled below this try/catch block.
}
if (finicky) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (finicky && !Boolean.getBoolean("disable hack")) { (or better an effectively final field) so that if your finicky hack is causing issues in production you can at least turn it off without restarting

@rsandell
Copy link
Copy Markdown
Member Author

rsandell commented Feb 13, 2019

I've been staring at that spotcheck error and it doesn't make sense to me, it seems like something obvious but I can't see it.

// means that does not exist and this is handled below this try/catch block.
}
if (finicky) {
final Optional<List<String>> status = Optional.ofNullable(fnf.getResponseHeaderFields() != null ? fnf.getResponseHeaderFields().get(null) : null);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are not supposed to pass an explicit null to Optional.ofNullable... so try

fnf.getResponseHeaderFields() != null ? Optional.ofNullable(fnf.getResponseHeaderFields().get(null)) : Optional.empty())

Or perhaps it is the .get(null) that it objects to @rsandell

@kshultzCB
Copy link
Copy Markdown
Collaborator

kshultzCB commented Feb 13, 2019

To help set context for the strange looking 404 errors, a little back story on the tests I've been running.

The relationship between strange 404s and missing branches/PRs

  1. I've got a repository that I'm using for this work. It's nothing fancy, but does hold the shell scripts I'm using to automatically create / delete PR branches.

  2. On the Jenkins master, I've got a custom build of okhttp-urlconnection, placed in plugins/github/WEB-INF/lib/ and /github-api/WEB-INF/lib/. Through this change, I can see additional log output.

  3. When running the create-pull-requests.sh script, I'm tail ing the Jenkins log, which is at the default location of /var/log/jenkins/jenkins.log.

  4. Occasionally, but not always, GitHub returns a 404 during these branch create operations. A complete log of "create 10 PRs" can be seen in this sample log, but the 404 I'm talking about specifically looks like this:

HTTP Headers:

Server: GitHub.com,
Date: Fri, 08 Feb 2019 23:13:28 GMT,
Content-Type: application/json; charset=utf-8,
Transfer-Encoding: chunked,
Status: 404 Not Found,
X-RateLimit-Limit: 5000,
X-RateLimit-Remaining: 4998,
X-RateLimit-Reset: 1549671208,
X-OAuth-Scopes: repo,
X-Accepted-OAuth-Scopes: repo,
X-GitHub-Media-Type: github.v3; format=json,
Access-Control-Expose-Headers: ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type,
Access-Control-Allow-Origin: *,
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload,
X-Frame-Options: deny,
X-Content-Type-Options: nosniff,
X-XSS-Protection: 1; mode=block,
Referrer-Policy: origin-when-cross-origin, strict-origin-when-cross-origin,
Content-Security-Policy: default-src 'none',
Content-Encoding: gzip,
X-GitHub-Request-Id: E898:4177:15A16C5:2B3EC57:5C5E0D18,
OkHttp-Sent-Millis: 1549667608356,
OkHttp-Received-Millis: 1549667608397,
Server returned http error code: 404 and error message: �1�0
                                                               Ы ��f`��
�WɉClwA�X�{S�j*���b��x�t�,�7tKV�?|��w��k�X�R���\6i�����R��
                                                                #�-L�j���O�An

Only when this 404 was present during a branch create operation did I see inconsistent behavior from Jenkins. But, importantly, I have seen a direct relationship between the 404s and the inconsistent Jenkins behavior. If GitHub does not report a 404, everything works fine. But if it does, the inconsistent branches and PRs would appear in Jenkins, every time.

That inconsistent behavior looks like:

  • Branches being shown on the Branches tab which should not be there, since we're supposed to be Excluding branches which are also filed as PRs:
    image

  • Missing PRs. In this screenshot, PR-10, PR-11, and PR-12 are missing:
    image

How are things going with this PR in place

  • After creating several hundred PRs, including a duplicate repository intended to speed up the work - I did not see a 404 all day yesterday from GitHub during creation of a PR. Not a single time. Which is...frustrating.

  • As a control, I switched back to the original beta github-branch-source plugin I'd been using previously. Still no 404s from GitHub.

  • I reliably get 404s from GitHub when performing branch deletes, via this script. But, from my understanding, orphaned items like this will only be "cleaned up" with a full scan - a webhook will not trigger it. So, the branches and PRs remain in place on Jenkins until I do that, at which point, they are removed as expected.

// means that does not exist and this is handled below this try/catch block.
}
if (finicky) {
final Optional<List<String>> status = Optional.ofNullable(fnf.getResponseHeaderFields() != null ? fnf.getResponseHeaderFields().get(null) : null);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final Optional<List<String>> status = Optional.ofNullable(fnf.getResponseHeaderFields() != null ? fnf.getResponseHeaderFields().get(null) : null);
final Optional<List<String>> status = Optional.ofNullable(fnf.getResponseHeaderFields() != null ? Optional.ofNullable(fnf.getResponseHeaderFields().get(null)) : Optional.empty());

Turning @stephenc 's comment into a suggestion.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That didn't help. It was because it thought that just because the method didn't return null on the first call it could on the second call 🤦‍♂️ .

@bitwiseman
Copy link
Copy Markdown
Contributor

@rsandell - With the cache turned on Jenkins fetches the repo and then get that specific Jenkinsfile at a particular revision, right?

@kshultzCB - So what you're saying is we don't have a stable recreate of the issue using the unmodified plugin?

@kshultzCB
Copy link
Copy Markdown
Collaborator

kshultzCB commented Feb 14, 2019

@bitwiseman - I'm saying "on Monday, we had a fairly reliable recreate. And now, I got nothin'." :( Not a single recreate with the unmodified plugin since Tuesday afternoon, and I've done it hundreds of times.

I did reach out to GitHub themselves about the issue. They pointed me to a couple of Stack Overflow articles (1, 2), both of which talk about okhttp being at fault for the gibberish characters.

I got curious and started poking around. The version of okhttp that's in use by github-plugin and github-api-plugin is 2.7.5, which looks like it's celebrating its third birthday this month. Makes me wonder if such issues as those described in the two Stack Overflow posts may have been fixed in that time.

More poking around showed me that, if that is indeed the issue (or even just part of the issue), it's probably not a simple drop-in operation to update okhttp in github-plugin and github-api-plugin. okhttp is now okhttp3, which brings along a requirement for Java 8. Possibly some other dependency updates too.

@KostyaSha
Copy link
Copy Markdown
Member

The initial feature of cache i added in github-plugin, but with this ancient libraries better disable it.
okhttp's cache is caching 404 for X time, okhttp team doesn't support ancient (as usual in jenkins) version. Either you will fight with random issues. The other part of story is that github has delays between almost all operations (my test cases for example are full of polls https://github.com/KostyaSha/github-integration-plugin/tree/master/github-pullrequest-plugin/src/test/java/org/jenkinsci/plugins/github_integration/awaitility ).

The correct solution will be to start using fresh library and excluding cache for some error codes.

@bitwiseman
Copy link
Copy Markdown
Contributor

bitwiseman commented Feb 14, 2019

@kshultzCB
Jenkins requires Java 8+ already, so no problem there. This does look like a notable change. 😮 But it also claims to have better handling of multipart.

@kshultzCB
Copy link
Copy Markdown
Collaborator

Yep, I helped out with Java-8-i-zation of gerrit-trigger, so I thought, maybe I could at least take a quick look. Didn't get much further along than that.

Thinking out loud: would it make sense to have the cache be opt-in? That's already been done for Windows because of a file naming problem. If it were opt-in, users could make a decision based on their needs. If they're hitting a lot of rate limit problems, they could turn it on, knowing there might be some tradeoffs. Maybe that might be a decent stopgap measure until some modernization can be done (Java8, okhttp3, etc).

@rsandell
Copy link
Copy Markdown
Member Author

rsandell commented Feb 14, 2019

I've already bumped this plugin to Java 8 (hence the use of stream() in this pr) and KK's github-api seems ready for okhttp3?

@rsandell rsandell closed this Feb 14, 2019
@rsandell rsandell reopened this Feb 14, 2019
@kshultzCB
Copy link
Copy Markdown
Collaborator

The one I was looking at that's not yet been bumped to Java 8 was github-plugin.

@rsandell
Copy link
Copy Markdown
Member Author

Since the problem on the GitHub side seems to have magically disappeared during the last two weeks. Maybe we should merge this with the feature flag off by default?
We need to do something as the current master has the behaviour of re-throwing the fnf.

Copy link
Copy Markdown
Collaborator

@kshultzCB kshultzCB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a good point at which to merge. Thanks, @rsandell 💯

@rsandell rsandell merged commit 28603b8 into master Feb 20, 2019
@rsandell rsandell deleted the JENKINS-54126 branch February 20, 2019 14:50
@rsandell rsandell restored the JENKINS-54126 branch February 20, 2019 14:53
@rsandell rsandell deleted the JENKINS-54126 branch February 20, 2019 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants