The first step in making this repository useful is to populate it with scripts that are currently manually copied around pathogen repos.
See shared GDoc for additional context and details on scripts.
Progress
This was originally created by @joverlee521 in #1 (comment).
Identical scripts (added in #6)
Diverged scripts with various different versions used across workflows
(binned into related groups):
Simple notify scripts (added in #8)
S3 interaction + notify scripts that depend on S3 files (added in #12)
Genbank interactions
Nextclade joining
Potential augur curate scripts
Summary of differences
This is the original issue text from @jameshadfield.
Here's a quick scan of duplicated ingest scripts, using monkeypox as the "base", against 4 other ingest script directories:
Directories of scripts considered:
mpx # monkeypox/ingest/bin at a1f0d7b
hbv # hepatitisB/ingest/scripts at 1cdd197
rsv # rsv/ingest/bin at ba171f4
dengue # dengue/ingest/bin branch: new_ingest @ 247b2fd
ncov # ncov-ingest/bin at 88fddbe
Note that when there's only 1-3 lines different that's often just an added comment to indicate where the script's been copied from
mpx/apply-geolocation-rules
rsv/apply-geolocation-rules IDENTICAL
hbv/apply-geolocation-rules.py 17 lines different
dengue/apply-geolocation-rules IDENTICAL
mpx/cloudfront-invalidate
rsv/cloudfront-invalidate IDENTICAL
dengue/cloudfront-invalidate IDENTICAL
ncov/cloudfront-invalidate IDENTICAL
mpx/csv-to-ndjson
rsv/csv-to-ndjson.py 16 lines different
dengue/csv-to-ndjson IDENTICAL
ncov/csv-to-ndjson 3 lines different
mpx/download-from-s3
dengue/download-from-s3 2 lines different
ncov/download-from-s3 8 lines different
mpx/fasta-to-ndjson
rsv/fasta-to-ndjson IDENTICAL
dengue/fasta-to-ndjson IDENTICAL
mpx/fetch-from-genbank
dengue/fetch-from-genbank 1 lines different
mpx/genbank-url
rsv/genbank-url 42 lines different
dengue/genbank-url 11 lines different
mpx/join-metadata-and-clades.py
rsv/join-metadata-and-clades.py 3 lines different
dengue/join-metadata-and-clades.py IDENTICAL
ncov/join-metadata-and-clades 114 lines different
mpx/merge-user-metadata
rsv/merge-user-metadata IDENTICAL
dengue/merge-user-metadata IDENTICAL
mpx/ndjson-to-tsv-and-fasta
rsv/ndjson-to-tsv-and-fasta IDENTICAL
dengue/ndjson-to-tsv-and-fasta IDENTICAL
mpx/notify-on-diff
dengue/notify-on-diff IDENTICAL
mpx/notify-on-job-fail
rsv/notify-on-job-fail 1 lines different
dengue/notify-on-job-fail 1 lines different
ncov/notify-on-job-fail 10 lines different
mpx/notify-on-job-start
rsv/notify-on-job-start 3 lines different
dengue/notify-on-job-start 3 lines different
ncov/notify-on-job-start 30 lines different
mpx/notify-on-record-change
rsv/notify-on-record-change 3 lines different
dengue/notify-on-record-change 3 lines different
ncov/notify-on-record-change 6 lines different
mpx/notify-slack
rsv/notify-slack 15 lines different
dengue/notify-slack IDENTICAL
ncov/notify-slack 16 lines different
mpx/reverse_reversed_sequences.py
dengue/reverse_reversed_sequences.py IDENTICAL
mpx/s3-object-exists
rsv/s3-object-exists IDENTICAL
dengue/s3-object-exists IDENTICAL
ncov/s3-object-exists 1 lines different
mpx/sha256sum
rsv/sha256sum IDENTICAL
dengue/sha256sum IDENTICAL
ncov/sha256sum 1 lines different
mpx/transform-authors
rsv/transform-authors IDENTICAL
dengue/transform-authors IDENTICAL
mpx/transform-date-fields
rsv/transform-date-fields IDENTICAL
dengue/transform-date-fields IDENTICAL
mpx/transform-field-names
rsv/transform-field-names IDENTICAL
dengue/transform-field-names IDENTICAL
mpx/transform-genbank-location
rsv/transform-genbank-location IDENTICAL
dengue/transform-genbank-location IDENTICAL
mpx/transform-strain-names
rsv/transform-strain-names 1 lines different
dengue/transform-strain-names IDENTICAL
mpx/transform-string-fields
rsv/transform-string-fields IDENTICAL
dengue/transform-string-fields IDENTICAL
mpx/trigger
dengue/trigger IDENTICAL
ncov/trigger IDENTICAL
mpx/trigger-on-new-data
dengue/trigger-on-new-data 1 lines different
ncov/trigger-on-new-data 6 lines different
mpx/upload-to-s3
rsv/upload-to-s3 3 lines different
dengue/upload-to-s3 3 lines different
ncov/upload-to-s3 1 lines different
The first step in making this repository useful is to populate it with scripts that are currently manually copied around pathogen repos.
See shared GDoc for additional context and details on scripts.
Progress
This was originally created by @joverlee521 in #1 (comment).
Identical scripts (added in #6)
Diverged scripts with various different versions used across workflows
(binned into related groups):
Simple notify scripts (added in #8)
S3 interaction + notify scripts that depend on S3 files (added in #12)
Genbank interactions
Nextclade joining
join-metadata-and-clades (TBD)Dropping custom Python script in favor of csvtk/tsv-utils commands (Replace join metadata and clades script with csvtk and tsv append mpox#207)Potential augur curate scripts
apply-geolocation-rulesfrom monkeypox repo #4Summary of differences
This is the original issue text from @jameshadfield.
Here's a quick scan of duplicated ingest scripts, using monkeypox as the "base", against 4 other ingest script directories:
Directories of scripts considered: