Skip to content

Add duplicated scripts from pathogen repos #1

@jameshadfield

Description

@jameshadfield

The first step in making this repository useful is to populate it with scripts that are currently manually copied around pathogen repos.

See shared GDoc for additional context and details on scripts.

Progress

This was originally created by @joverlee521 in #1 (comment).

Identical scripts (added in #6)

  • s3-object-exists
  • trigger
  • sha256sum
  • cloudfront-invalidate
  • merge-user-metadata
  • transform-authors
  • transform-field-names
  • tranform-genbank-location

Diverged scripts with various different versions used across workflows
(binned into related groups):

Simple notify scripts (added in #8)

  • notify-slack
  • notify-on-job-start
  • notify-on-job-fail

S3 interaction + notify scripts that depend on S3 files (added in #12)

  • upload-to-s3
  • download-from-s3
  • notify-on-record-change
  • notify-on-diff
  • trigger-on-new-data

Genbank interactions

Nextclade joining

Potential augur curate scripts

Summary of differences

This is the original issue text from @jameshadfield.

Here's a quick scan of duplicated ingest scripts, using monkeypox as the "base", against 4 other ingest script directories:

Directories of scripts considered:

mpx       # monkeypox/ingest/bin at a1f0d7b
hbv       # hepatitisB/ingest/scripts at 1cdd197
rsv       # rsv/ingest/bin at ba171f4
dengue    # dengue/ingest/bin branch: new_ingest @ 247b2fd 
ncov      # ncov-ingest/bin at 88fddbe

Note that when there's only 1-3 lines different that's often just an added comment to indicate where the script's been copied from

mpx/apply-geolocation-rules
        rsv/apply-geolocation-rules     IDENTICAL
        hbv/apply-geolocation-rules.py      17 lines different
        dengue/apply-geolocation-rules  IDENTICAL
mpx/cloudfront-invalidate
        rsv/cloudfront-invalidate       IDENTICAL
        dengue/cloudfront-invalidate    IDENTICAL
        ncov/cloudfront-invalidate      IDENTICAL
mpx/csv-to-ndjson
        rsv/csv-to-ndjson.py      16 lines different
        dengue/csv-to-ndjson    IDENTICAL
        ncov/csv-to-ndjson       3 lines different
mpx/download-from-s3
        dengue/download-from-s3       2 lines different
        ncov/download-from-s3       8 lines different
mpx/fasta-to-ndjson
        rsv/fasta-to-ndjson     IDENTICAL
        dengue/fasta-to-ndjson  IDENTICAL
mpx/fetch-from-genbank
        dengue/fetch-from-genbank       1 lines different
mpx/genbank-url
        rsv/genbank-url      42 lines different
        dengue/genbank-url      11 lines different
mpx/join-metadata-and-clades.py
        rsv/join-metadata-and-clades.py       3 lines different
        dengue/join-metadata-and-clades.py      IDENTICAL
        ncov/join-metadata-and-clades     114 lines different
mpx/merge-user-metadata
        rsv/merge-user-metadata IDENTICAL
        dengue/merge-user-metadata      IDENTICAL
mpx/ndjson-to-tsv-and-fasta
        rsv/ndjson-to-tsv-and-fasta     IDENTICAL
        dengue/ndjson-to-tsv-and-fasta  IDENTICAL
mpx/notify-on-diff
        dengue/notify-on-diff   IDENTICAL
mpx/notify-on-job-fail
        rsv/notify-on-job-fail       1 lines different
        dengue/notify-on-job-fail       1 lines different
        ncov/notify-on-job-fail      10 lines different
mpx/notify-on-job-start
        rsv/notify-on-job-start       3 lines different
        dengue/notify-on-job-start       3 lines different
        ncov/notify-on-job-start      30 lines different
mpx/notify-on-record-change
        rsv/notify-on-record-change       3 lines different
        dengue/notify-on-record-change       3 lines different
        ncov/notify-on-record-change       6 lines different
mpx/notify-slack
        rsv/notify-slack      15 lines different
        dengue/notify-slack     IDENTICAL
        ncov/notify-slack      16 lines different
mpx/reverse_reversed_sequences.py
        dengue/reverse_reversed_sequences.py    IDENTICAL
mpx/s3-object-exists
        rsv/s3-object-exists    IDENTICAL
        dengue/s3-object-exists IDENTICAL
        ncov/s3-object-exists       1 lines different
mpx/sha256sum
        rsv/sha256sum   IDENTICAL
        dengue/sha256sum        IDENTICAL
        ncov/sha256sum       1 lines different
mpx/transform-authors
        rsv/transform-authors   IDENTICAL
        dengue/transform-authors        IDENTICAL
mpx/transform-date-fields
        rsv/transform-date-fields       IDENTICAL
        dengue/transform-date-fields    IDENTICAL
mpx/transform-field-names
        rsv/transform-field-names       IDENTICAL
        dengue/transform-field-names    IDENTICAL
mpx/transform-genbank-location
        rsv/transform-genbank-location  IDENTICAL
        dengue/transform-genbank-location       IDENTICAL
mpx/transform-strain-names
        rsv/transform-strain-names       1 lines different
        dengue/transform-strain-names   IDENTICAL
mpx/transform-string-fields
        rsv/transform-string-fields     IDENTICAL
        dengue/transform-string-fields  IDENTICAL
mpx/trigger
        dengue/trigger  IDENTICAL
        ncov/trigger    IDENTICAL
mpx/trigger-on-new-data
        dengue/trigger-on-new-data       1 lines different
        ncov/trigger-on-new-data       6 lines different
mpx/upload-to-s3
        rsv/upload-to-s3       3 lines different
        dengue/upload-to-s3       3 lines different
        ncov/upload-to-s3       1 lines different

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions