Skip to content

List releases pagination#96

Merged
aoldershaw merged 5 commits intoconcourse:masterfrom
kirillbilchenko:list_releases_pagination
May 12, 2020
Merged

List releases pagination#96
aoldershaw merged 5 commits intoconcourse:masterfrom
kirillbilchenko:list_releases_pagination

Conversation

@kirillbilchenko
Copy link
Copy Markdown
Contributor

@kirillbilchenko kirillbilchenko commented May 11, 2020

Fixes #93

List releases pagination added for corner case when there is a lot of draft releases on the first page

…f draft releases on the first page

Signed-off-by: kirillbilchenko <kirya7@gmail.com>
@kirillbilchenko kirillbilchenko force-pushed the list_releases_pagination branch from 27944df to ffa19d7 Compare May 11, 2020 09:56
@aoldershaw
Copy link
Copy Markdown
Contributor

Hey @kirillbilchenko, thanks for the PR! This definitely addresses a bug with the resource.

The one thing I'm concerned about here is that if we are to fully paginate the Release list, this could really eat up the user's GitHub API Rate Limit, which is 5000 requests per hour.

For instance, suppose we were using releases from https://github.com/kubernetes/kubernetes. There are currently 65 pages of releases, meaning if you're doing this check every minute, that's already consuming 3900 requests per hour (for a single resource!)

I can think of a couple ways to try to address this:

  1. Increase the page size (to e.g. 100) - now each check is only taking 7 API requests. I haven't actually confirmed that this will work. It's also not ideal IMO compared to the other solutions I'll propose, since we're still usually pulling way more than we need to.

  2. If there is already a latest version for the resource, only paginate until you get to that version. For instance, you could change ListResources to be something like this:

func (g *GitHubClient) ListReleases(untilVersion resource.Version) ([]*github.RepositoryRelease, error) {
	opt := &github.RepositoryListByOrgOptions{
		ListOptions: github.ListOptions{PerPage: 10},
	}
	var allReleases []*github.RepositoryRelease
	for {
		releases, res, err := g.client.Repositories.ListReleases(context.TODO(), g.owner, g.repository, nil)
		if err != nil {
			return []*github.RepositoryRelease{}, err
		}
		foundPreviousVersion := false
		for _, release := range releases {
			allReleases = append(allReleases, release)
			if versionFromRelease(release) == untilVersion {
				foundPreviousVersion = true
				break
			}
		}
		if foundPreviousVersion {
			break
		}
		if res.NextPage == 0 {
			err = res.Body.Close()
			if err != nil {
				return nil, err
			}
			break
		}
		opt.Page = res.NextPage
	}

	return allReleases, nil
}

Where untilVersion if the zero-value if there is no prior version.

With approach 2, we only ever paginate to the last known version - so, most of the time, it'll only consume a 1 or 2 API requests.

If increasing the page size (as suggested in approach 1) still only counts as a single API request, we might as well combine both approaches - so, each page has 100 releases, and we only request new pages as long as we haven't reached the last known version

@kirillbilchenko
Copy link
Copy Markdown
Contributor Author

@aoldershaw I think for now it will be better to increase the number of items per page, but it can still lead to git api limitation, because as you said we never know how much releases will have certain repo. For second approach, as for me for user it will be hard to get the version as parameter and not that user friendly. But final decision is on your side.

@kirillbilchenko
Copy link
Copy Markdown
Contributor Author

as another option we can use for example 5 last pages for 100 items. Will be quite big amount, usually users are not going through all the pages.

@aoldershaw
Copy link
Copy Markdown
Contributor

@kirillbilchenko the version is passed in by Concourse to check, so it shouldn't reflect any change to the end user. For checks, I think we'd just need to change this line:

releases, err := c.github.ListReleases()

to be more like:

releases, err := c.github.ListReleases(request.Version) 

Thinking about it more, though, I'd be inclined to just go with approach 1. I tested it out, and it seems that requesting 100 releases still just consumes 1 API request (and 7 API requests per check doesn't seem SO bad for a huge repo like Kubernetes).

I'm not sure that we can rely on the order of releases returned by the Github v3 API since I can't find any documentation around it (see https://github.bokerqi.topmunity/t5/How-to-use-Git-and-GitHub/Releases-API-can-t-figure-out-order/td-p/5037). So, maybe it's best to fetch the entire list of releases each time, just to be safe.

@aoldershaw
Copy link
Copy Markdown
Contributor

Also, could you add a test case for this pagination feature to https://github.com/concourse/github-release-resource/blob/master/github_test.go?

@aoldershaw
Copy link
Copy Markdown
Contributor

About your suggestion to only fetch the last 5 pages - while I agree that would likely solve 99.9% of cases, it's kind of an arbitrary restriction to place.

I'd feel more comfortable with just fetching all of the resources. If this ends up being a problem for people, we could think about adding a configurable limit on the number of pages to fetch - but for now, it's probably not worth it. I'm guessing the vast majority of repos will fit into a single page anyway.

Likely a better solution (which I'm not suggesting you tackle - just bringing it up) would be to use the GraphQL v4 API, which allows us to be explicit about how we want the data ordered:

query { 
  repository(name: "kubernetes", owner: "kubernetes") {
    releases(orderBy: {field:CREATED_AT, direction:DESC}) {
      ...
    }
  }
}

which would allow us to comfortably go with the approach where we only fetch releases until we reach the last known version. But, for now, I think our best bet is to just fetch everything (with a page size of 100)

@kirillbilchenko kirillbilchenko force-pushed the list_releases_pagination branch from f90c343 to 584a658 Compare May 11, 2020 16:46
@kirillbilchenko
Copy link
Copy Markdown
Contributor Author

@aoldershaw PR is ready for review

Copy link
Copy Markdown
Contributor

@aoldershaw aoldershaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few changes

Comment thread github_test.go Outdated
Comment thread github.go Outdated
…in base method handler

                                                 Signed-off-by: kirillbilchenko <kirya7@gmail.com>

Signed-off-by: kirillbilchenko <kirya7@gmail.com>
Signed-off-by: kirillbilchenko <kirya7@gmail.com>

Add test to check pagination functionality

Signed-off-by: kirillbilchenko <kirya7@gmail.com>
…in base method handler

Signed-off-by: kirillbilchenko <kirya7@gmail.com>

Signed-off-by: kirillbilchenko <kirya7@gmail.com>
@kirillbilchenko kirillbilchenko force-pushed the list_releases_pagination branch from 4ca483b to 25ee28f Compare May 12, 2020 14:29
@kirillbilchenko
Copy link
Copy Markdown
Contributor Author

@aoldershaw updated thanks for suggestion and good catches!

Copy link
Copy Markdown
Contributor

@aoldershaw aoldershaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your changes! Looks really good. Just have one minor nit (some left-over code). I'll also do some acceptance as a sanity check to make sure it works as expected.

Comment thread github_test.go Outdated
Comment thread github_test.go
Signed-off-by: kirillbilchenko <kirya7@gmail.com>
@kirillbilchenko
Copy link
Copy Markdown
Contributor Author

@aoldershaw done, thanks again for review

Copy link
Copy Markdown
Contributor

@aoldershaw aoldershaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used the following pipeline for acceptance:

resources:
- name: k8s-release
  type: github-release
  source:
    owner: kubernetes
    repository: kubernetes
    tag_filter: v1.10.0
jobs:
- name: run1
  plan:
  - get: k8s-release
    trigger: true
  - task: ls
    config:
      platform: linux
      image_resource:
        type: registry-image
        source: {repository: busybox}
      run:
        path: ls
        args: [k8s-release]

This is using the github-release resource without this fix. The resource receives no versions since v1.10.0 isn't on the first page. Using the resource type with this change applied:

- name: github-release-2
  type: registry-image
  source: {repository: concourse/github-release-resource, tag: pr-alpine-96}
resources:
- name: k8s-release
  type: github-release-2
  source:
    owner: kubernetes
    repository: kubernetes
    tag_filter: v1.10.0
jobs:
- name: run1
  plan:
  - get: k8s-release
    trigger: true
  - task: ls
    config:
      platform: linux
      image_resource:
        type: registry-image
        source: {repository: busybox}
      run:
        path: ls
        args: [k8s-release]

and now we're able to detect the resource.

Thanks for the contribution!

@aoldershaw aoldershaw merged commit 299ad3b into concourse:master May 12, 2020
@kirillbilchenko kirillbilchenko deleted the list_releases_pagination branch May 13, 2020 10:29
@SuperTux88
Copy link
Copy Markdown

First, thanks for the patch ❤️

After reporting #93 I wrote a little bash script to do the pagination as a workaround for my use-case. And to reduce the number of requests, I also set the page-size to 100, but it didn't work stable, every now and then, a request timeouted because 100 was too big, and it works way more stable after I reduced it to 50 (I didn't see it timeout since then).

So even with this patch, this isn't really useable for me, because the page-size is hardcoded, and 100 is too big and can lead to timeouts. Could you please make the page-size configurable?

And just for you to have something to test with: I saw you tested it with the kubernetes/kubernetes repo, where it's really fast, even with page-size 100 (and you probably didn't notice that it also can be slow), but my use-case is the AdoptOpenJDK/openjdk11-binaries repo, where every release has a shit ton of assets (which is probably the reason why it takes so much longer if it needs to collect all the assets for 100 releases): https://github.com/AdoptOpenJDK/openjdk11-binaries/releases

@aoldershaw
Copy link
Copy Markdown
Contributor

@SuperTux88 tested with the AdoptOpenJDK/openjdk11-binaries, and sure enough, it's super slow because of the huge number of assets. A single page of 100 releases also comes in at a whopping 30MB.

To me, allowing the page size to be configurable strikes me as a bit of a hack - even with a page size of 50, things can be really slow. I think the proper solution here would be to switch over to use the GraphQL API and only pull the information we care about (especially, not the assets, which are the real problem here).

That said, switching over to GraphQL seems like it might be a bit of an undertaking. I'm not sure if I'll have the chance to tackle it anytime soon. If you'd be interested in looking into it, I'd be happy to help out as best I can!

@SuperTux88
Copy link
Copy Markdown

I'm probably not able to have a look at switching to GraphQL anytime soon either, especially since I'm not really fit with go. But a config-option would already help me, otherwise I'll probably just stay with my bash solution at the moment, which also isn't perfect (and also downloads all the asset-lists), but it works, and I'm also not checking it every minute, so the response-size isn't that big of a deal for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Returns no releases if there are no matching releases on the first page

3 participants