This issue was originally written as an overview of git subtree, but later repurposed into a discussion of how to vendor scripts, specifically choosing between git subtree and git subrepo. In the end, we settled on git subrepo noting that this is a small implementation detail that can vary by pathogen repo, and be changed in the future.
Original issue
This is a summary of how I used git subtree as part of #2. Note that different pipelines can choose different methods of vendoring scripts from this repo, but git subtree is particularly nice as it requires no knowledge of its existence from a user of the pipeline.
Helpful reading: Git Subtree basics
The script was added to this repo (nextstrain/ingest) from within nextstrain/hepatitisB using ingest as a subtree. Specifically, from the hepB repo:
# use a branch (in hepB)
git checkout -b 'vendored-scripts'
# add the ingest repo as a subtre, using the 'apply-geolocation-rules' branch
git remote add ingest-remote git@github.com:nextstrain/ingest.git
git subtree add --prefix ingest/vendored ingest-remote apply-geolocation-rules --squash
# Adds a merge commit with one parent the previous host repo HEAD commit,
# and the other a squashed commit of the 'ingest' repo
# move the script to the subtree repo (ingest/vendored), modify the snakemake
# rules accordingly and commit changes (to the hepB repo)
# push the changes up to the subtree repo (ingest) on branch apply-geolocation-rules
git subtree push --prefix ingest/vendored ingest-remote apply-geolocation-rules
# The commit message was identical, but only the changes to ingest/vendored
# were part of the subtree commit (probably obvious!)
It was tested in monkeypox by pulling in (this branch of) the ingest repo as a subtree, and updating the transform rule accordingly.
Reflections
This approach is pretty straightforward but changing the branch of a subtree seems to pollute the git history a bit. An alternative approach would be to simply have a subtree of ingest at the main branch, push any changes to the subtree up to a branch of the subtree, merge that branch via GitHub (with code review etc), then pull down the changes once they're on the main branch (of the subtree repo).
Given a script with differences in multiple repos, the most straightforward may be to simply to create a to-be-vendored version of a script locally, copy it into each repo to test, and when you are satisfied create a PR in this ingest repo without using subtrees at all. Once it's in main, it is straightforward to pull it into each pathogen repo using git subtree pull ....
Comments / improvements welcome. At the very least this may give others a quick start guide!
This issue was originally written as an overview of
git subtree, but later repurposed into a discussion of how to vendor scripts, specifically choosing betweengit subtreeandgit subrepo. In the end, we settled ongit subreponoting that this is a small implementation detail that can vary by pathogen repo, and be changed in the future.Original issue
This is a summary of how I used
git subtreeas part of #2. Note that different pipelines can choose different methods of vendoring scripts from this repo, butgit subtreeis particularly nice as it requires no knowledge of its existence from a user of the pipeline.Helpful reading: Git Subtree basics
The script was added to this repo (nextstrain/ingest) from within nextstrain/hepatitisB using ingest as a subtree. Specifically, from the hepB repo:
It was tested in monkeypox by pulling in (this branch of) the ingest repo as a subtree, and updating the transform rule accordingly.
Reflections
This approach is pretty straightforward but changing the branch of a subtree seems to pollute the git history a bit. An alternative approach would be to simply have a subtree of ingest at the
mainbranch, push any changes to the subtree up to a branch of the subtree, merge that branch via GitHub (with code review etc), then pull down the changes once they're on themainbranch (of the subtree repo).Given a script with differences in multiple repos, the most straightforward may be to simply to create a to-be-vendored version of a script locally, copy it into each repo to test, and when you are satisfied create a PR in this
ingestrepo without using subtrees at all. Once it's inmain, it is straightforward to pull it into each pathogen repo usinggit subtree pull ....Comments / improvements welcome. At the very least this may give others a quick start guide!