Conversation
…f draft releases on the first page Signed-off-by: kirillbilchenko <kirya7@gmail.com>
27944df to
ffa19d7
Compare
|
Hey @kirillbilchenko, thanks for the PR! This definitely addresses a bug with the resource. The one thing I'm concerned about here is that if we are to fully paginate the Release list, this could really eat up the user's GitHub API Rate Limit, which is 5000 requests per hour. For instance, suppose we were using releases from https://github.com/kubernetes/kubernetes. There are currently 65 pages of releases, meaning if you're doing this check every minute, that's already consuming 3900 requests per hour (for a single resource!) I can think of a couple ways to try to address this:
func (g *GitHubClient) ListReleases(untilVersion resource.Version) ([]*github.RepositoryRelease, error) {
opt := &github.RepositoryListByOrgOptions{
ListOptions: github.ListOptions{PerPage: 10},
}
var allReleases []*github.RepositoryRelease
for {
releases, res, err := g.client.Repositories.ListReleases(context.TODO(), g.owner, g.repository, nil)
if err != nil {
return []*github.RepositoryRelease{}, err
}
foundPreviousVersion := false
for _, release := range releases {
allReleases = append(allReleases, release)
if versionFromRelease(release) == untilVersion {
foundPreviousVersion = true
break
}
}
if foundPreviousVersion {
break
}
if res.NextPage == 0 {
err = res.Body.Close()
if err != nil {
return nil, err
}
break
}
opt.Page = res.NextPage
}
return allReleases, nil
}Where With approach 2, we only ever paginate to the last known version - so, most of the time, it'll only consume a 1 or 2 API requests. If increasing the page size (as suggested in approach 1) still only counts as a single API request, we might as well combine both approaches - so, each page has 100 releases, and we only request new pages as long as we haven't reached the last known version |
|
@aoldershaw I think for now it will be better to increase the number of items per page, but it can still lead to git api limitation, because as you said we never know how much releases will have certain repo. For second approach, as for me for user it will be hard to get the version as parameter and not that user friendly. But final decision is on your side. |
|
as another option we can use for example 5 last pages for 100 items. Will be quite big amount, usually users are not going through all the pages. |
|
@kirillbilchenko the version is passed in by Concourse to github-release-resource/check_command.go Line 44 in 080646a to be more like: releases, err := c.github.ListReleases(request.Version) Thinking about it more, though, I'd be inclined to just go with approach 1. I tested it out, and it seems that requesting 100 releases still just consumes 1 API request (and 7 API requests per check doesn't seem SO bad for a huge repo like Kubernetes). I'm not sure that we can rely on the order of releases returned by the Github v3 API since I can't find any documentation around it (see https://github.bokerqi.topmunity/t5/How-to-use-Git-and-GitHub/Releases-API-can-t-figure-out-order/td-p/5037). So, maybe it's best to fetch the entire list of releases each time, just to be safe. |
|
Also, could you add a test case for this pagination feature to https://github.com/concourse/github-release-resource/blob/master/github_test.go? |
|
About your suggestion to only fetch the last 5 pages - while I agree that would likely solve 99.9% of cases, it's kind of an arbitrary restriction to place. I'd feel more comfortable with just fetching all of the resources. If this ends up being a problem for people, we could think about adding a configurable limit on the number of pages to fetch - but for now, it's probably not worth it. I'm guessing the vast majority of repos will fit into a single page anyway. Likely a better solution (which I'm not suggesting you tackle - just bringing it up) would be to use the GraphQL v4 API, which allows us to be explicit about how we want the data ordered: query {
repository(name: "kubernetes", owner: "kubernetes") {
releases(orderBy: {field:CREATED_AT, direction:DESC}) {
...
}
}
}which would allow us to comfortably go with the approach where we only fetch releases until we reach the last known version. But, for now, I think our best bet is to just fetch everything (with a page size of 100) |
f90c343 to
584a658
Compare
|
@aoldershaw PR is ready for review |
…in base method handler
Signed-off-by: kirillbilchenko <kirya7@gmail.com>
Signed-off-by: kirillbilchenko <kirya7@gmail.com>
Signed-off-by: kirillbilchenko <kirya7@gmail.com> Add test to check pagination functionality Signed-off-by: kirillbilchenko <kirya7@gmail.com>
…in base method handler Signed-off-by: kirillbilchenko <kirya7@gmail.com> Signed-off-by: kirillbilchenko <kirya7@gmail.com>
4ca483b to
25ee28f
Compare
|
@aoldershaw updated thanks for suggestion and good catches! |
aoldershaw
left a comment
There was a problem hiding this comment.
Thanks for your changes! Looks really good. Just have one minor nit (some left-over code). I'll also do some acceptance as a sanity check to make sure it works as expected.
Signed-off-by: kirillbilchenko <kirya7@gmail.com>
|
@aoldershaw done, thanks again for review |
There was a problem hiding this comment.
Used the following pipeline for acceptance:
resources:
- name: k8s-release
type: github-release
source:
owner: kubernetes
repository: kubernetes
tag_filter: v1.10.0
jobs:
- name: run1
plan:
- get: k8s-release
trigger: true
- task: ls
config:
platform: linux
image_resource:
type: registry-image
source: {repository: busybox}
run:
path: ls
args: [k8s-release]
This is using the github-release resource without this fix. The resource receives no versions since v1.10.0 isn't on the first page. Using the resource type with this change applied:
- name: github-release-2
type: registry-image
source: {repository: concourse/github-release-resource, tag: pr-alpine-96}
resources:
- name: k8s-release
type: github-release-2
source:
owner: kubernetes
repository: kubernetes
tag_filter: v1.10.0
jobs:
- name: run1
plan:
- get: k8s-release
trigger: true
- task: ls
config:
platform: linux
image_resource:
type: registry-image
source: {repository: busybox}
run:
path: ls
args: [k8s-release]
and now we're able to detect the resource.
Thanks for the contribution!
|
First, thanks for the patch ❤️ After reporting #93 I wrote a little bash script to do the pagination as a workaround for my use-case. And to reduce the number of requests, I also set the page-size to 100, but it didn't work stable, every now and then, a request timeouted because 100 was too big, and it works way more stable after I reduced it to 50 (I didn't see it timeout since then). So even with this patch, this isn't really useable for me, because the page-size is hardcoded, and 100 is too big and can lead to timeouts. Could you please make the page-size configurable? And just for you to have something to test with: I saw you tested it with the |
|
@SuperTux88 tested with the To me, allowing the page size to be configurable strikes me as a bit of a hack - even with a page size of 50, things can be really slow. I think the proper solution here would be to switch over to use the GraphQL API and only pull the information we care about (especially, not the assets, which are the real problem here). That said, switching over to GraphQL seems like it might be a bit of an undertaking. I'm not sure if I'll have the chance to tackle it anytime soon. If you'd be interested in looking into it, I'd be happy to help out as best I can! |
|
I'm probably not able to have a look at switching to GraphQL anytime soon either, especially since I'm not really fit with go. But a config-option would already help me, otherwise I'll probably just stay with my bash solution at the moment, which also isn't perfect (and also downloads all the asset-lists), but it works, and I'm also not checking it every minute, so the response-size isn't that big of a deal for me. |
Fixes #93
List releases pagination added for corner case when there is a lot of draft releases on the first page