Skip to content

How to vendor scripts in pathogen repos? #3

@jameshadfield

Description

@jameshadfield

This issue was originally written as an overview of git subtree, but later repurposed into a discussion of how to vendor scripts, specifically choosing between git subtree and git subrepo. In the end, we settled on git subrepo noting that this is a small implementation detail that can vary by pathogen repo, and be changed in the future.

Original issue

This is a summary of how I used git subtree as part of #2. Note that different pipelines can choose different methods of vendoring scripts from this repo, but git subtree is particularly nice as it requires no knowledge of its existence from a user of the pipeline.

Helpful reading: Git Subtree basics

The script was added to this repo (nextstrain/ingest) from within nextstrain/hepatitisB using ingest as a subtree. Specifically, from the hepB repo:

# use a branch (in hepB)
git checkout -b 'vendored-scripts'

# add the ingest repo as a subtre, using the 'apply-geolocation-rules' branch
git remote add ingest-remote git@github.com:nextstrain/ingest.git
git subtree add --prefix ingest/vendored ingest-remote apply-geolocation-rules --squash
# Adds a merge commit with one parent the previous host repo HEAD commit,
#     and the other a squashed commit of the 'ingest' repo

# move the script to the subtree repo (ingest/vendored), modify the snakemake
# rules accordingly and commit changes (to the hepB repo)

# push the changes up to the subtree repo (ingest) on branch apply-geolocation-rules
git subtree push --prefix ingest/vendored ingest-remote apply-geolocation-rules
# The commit message was identical, but only the changes to ingest/vendored
# were part of the subtree commit (probably obvious!)

It was tested in monkeypox by pulling in (this branch of) the ingest repo as a subtree, and updating the transform rule accordingly.

Reflections

This approach is pretty straightforward but changing the branch of a subtree seems to pollute the git history a bit. An alternative approach would be to simply have a subtree of ingest at the main branch, push any changes to the subtree up to a branch of the subtree, merge that branch via GitHub (with code review etc), then pull down the changes once they're on the main branch (of the subtree repo).

Given a script with differences in multiple repos, the most straightforward may be to simply to create a to-be-vendored version of a script locally, copy it into each repo to test, and when you are satisfied create a PR in this ingest repo without using subtrees at all. Once it's in main, it is straightforward to pull it into each pathogen repo using git subtree pull ....

Comments / improvements welcome. At the very least this may give others a quick start guide!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions