Skip to content

Bitbucket diffs #5

@aaronrmm

Description

@aaronrmm

Bitbucket has an API for public repos

Dataset URL - None

Does the dataset exists in a scraped format ? No (searched using google, papers with code, and kaggle).

Description

Bitbucket is far less popular for open source git repos, but does have them, and does provide an API for querying and filtering them. Because there are no stars in bitbucket as there are in github, we would have to approximate with number of watchers or number of contributors. It can also be filtered by language. It does not appear to be filterable by license.

Procedure

  1. Approximate the value of a bitbucket dataset by pulling metrics on open source. Using the Bitbucket API, pull the following information :
  • number of public repositories
  • distribution of watchers per repository
  • distribution of contributors per
  • number of commits per
  1. With the above information, determine a good metric for how repositories should be prioritized. Sort the repo list with this metric.

  2. Start pulling commit diffs from the highest priority repos. Docs

Metadata

Metadata

Assignees

No one assigned

    Labels

    dataset-requestRequest for addition of new dataset

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions