Skip to content

Reducing GitHub API calls to scale scanning repositories #202

@naveensrinivasan

Description

@naveensrinivasan

The GitHub API calls are throttled which makes it hard to scale the number of repositories to scan and provide results.

The code would have to wait for tens of minutes before continuing
{"level":"warn","ts":1613869247.8747272,"caller":"roundtripper/roundtripper.go:139","msg":"Rate limit exceeded. Waiting 44m34.125286853s to retry..."}

Scorecard checks for these don't need GitHub API, it requires a Git API

  1. Active
  2. Frozen-Deps
  3. CodeQLInCheckDefinitions
  4. Security-Policy
  5. Packaging

Potential solution

  1. Clone the Git Repo
  2. Git pull on these repo's on a cron - to get the updates
  3. Use an API to query these repositories directly instead of the GitHub

The https://github.com/go-git/go-git project provides an API on Git which could be used for avoiding the GitHub API limitations.

With httpcache #80 (comment) and reducing the number of GitHub API calls, we should be able to scale the scanning number of repositoreis.

related to #80

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions