Skip to content

Use centralized apply-geolocation-rules#166

Merged
victorlin merged 2 commits intomasterfrom
victorlin/update-ingest
Aug 8, 2023
Merged

Use centralized apply-geolocation-rules#166
victorlin merged 2 commits intomasterfrom
victorlin/update-ingest

Conversation

@victorlin
Copy link
Copy Markdown
Member

@victorlin victorlin commented Aug 4, 2023

Description of proposed changes

There have been a few updates to nextstrain/ingest since #164 was merged. I pulled those in with git subrepo, and swapped over to use the one newly added script.

Related issue(s)

Follow-up to #164.

Testing

  • Ingest workflow runs locally without errors

@victorlin victorlin requested a review from a team August 4, 2023 23:38
@victorlin victorlin self-assigned this Aug 4, 2023
Copy link
Copy Markdown
Member Author

@victorlin victorlin Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ingest failed locally with the following error:

/bin/bash: ./vendored/apply-geolocation-rules: /usr/bin/env: bad interpreter: Permission denied
Full error details

Snakemake log:

Error in rule transform:
    jobid: 1
    input: data/sequences.ndjson, data/all-geolocation-rules.tsv, source-data/annotations.tsv
    output: data/metadata_raw.tsv, data/sequences.fasta
    log: logs/transform.txt (check log file(s) for error details)
    shell:
        
        (cat data/sequences.ndjson             | ./vendored/transform-field-names                 --field-map collected=date submitted=date_submitted genbank_accession=accession submitting_organization=institution             | augur curate normalize-strings             | ./bin/transform-strain-names                 --strain-regex ^.+$                 --backup-fields accession             | ./bin/transform-date-fields                 --date-fields date date_submitted                 --expected-date-formats %Y %Y-%m %Y-%m-%d %Y-%m-%dT%H:%M:%SZ             | ./vendored/transform-genbank-location             | ./bin/transform-string-fields                 --titlecase-fields region country division location                 --articles and d de del des di do en l la las le los nad of op sur the y                 --abbreviations USA             | ./vendored/transform-authors                 --authors-field authors                 --default-value ?                 --abbr-authors-field abbr_authors             | ./vendored/apply-geolocation-rules                 --geolocation-rules data/all-geolocation-rules.tsv             | ./vendored/merge-user-metadata                 --annotations source-data/annotations.tsv                 --id-field accession             | ./bin/ndjson-to-tsv-and-fasta                 --metadata-columns accession genbank_accession_rev strain date region country division location host date_submitted sra_accession abbr_authors reverse authors institution                 --metadata data/metadata_raw.tsv                 --fasta data/sequences.fasta                 --id-field accession                 --sequence-field sequence ) 2>> logs/transform.txt
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job transform since they might be corrupted:
data/metadata_raw.tsv, data/sequences.fasta
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-08-08T152252.303349.snakemake.log

Contents of logs/transform.txt:

/bin/bash: ./vendored/apply-geolocation-rules: /usr/bin/env: bad interpreter: Permission denied
Traceback (most recent call last):
  File "/nextstrain/build/./vendored/transform-authors", line 65, in <module>
    json.dump(record, stdout, allow_nan=False, indent=None, separators=',:')
  File "/usr/local/lib/python3.10/json/__init__.py", line 180, in dump
    fp.write(chunk)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/nextstrain/build/./bin/transform-string-fields", line 83, in <module>
    json.dump(record, stdout, allow_nan=False, indent=None, separators=',:')
  File "/usr/local/lib/python3.10/json/__init__.py", line 180, in dump
    fp.write(chunk)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/nextstrain/build/./vendored/transform-genbank-location", line 42, in <module>
    json.dump(record, stdout, allow_nan=False, indent=None, separators=',:')
  File "/usr/local/lib/python3.10/json/__init__.py", line 180, in dump
    fp.write(chunk)
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/nextstrain/build/./bin/transform-date-fields", line 153, in <module>
    json.dump(record, stdout, allow_nan=False, indent=None, separators=',:')
  File "/usr/local/lib/python3.10/json/__init__.py", line 180, in dump
    fp.write(chunk)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/nextstrain/build/./bin/transform-strain-names", line 49, in <module>
    json.dump(record, stdout, allow_nan=False, indent=None, separators=',:')
  File "/usr/local/lib/python3.10/json/__init__.py", line 180, in dump
    fp.write(chunk)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/nextstrain/augur/augur/__init__.py", line 66, in run
    return args.__command__.run(args)
  File "/nextstrain/augur/augur/curate/__init__.py", line 192, in run
    dump_ndjson(modified_records)
  File "/nextstrain/augur/augur/io/json.py", line 64, in dump_ndjson
    print(as_json(item))
BrokenPipeError: [Errno 32] Broken pipe


An error occurred (see above) that has not been properly handled by Augur.
To report this, please open a new issue including the original command and the error above:
    <https://github.com/nextstrain/augur/issues/new/choose>

Traceback (most recent call last):
  File "/nextstrain/build/./vendored/transform-field-names", line 47, in <module>
    json.dump(record, stdout, allow_nan=False, indent=None, separators=',:')
  File "/usr/local/lib/python3.10/json/__init__.py", line 180, in dump
    fp.write(chunk)
BrokenPipeError: [Errno 32] Broken pipe

This is because I didn't add the execute bit on the script in nextstrain/shared#4. Will fix in that repo then update this PR.

subrepo:
  subdir:   "ingest/vendored"
  merged:   "5d90818"
upstream:
  origin:   "https://github.com/nextstrain/ingest"
  branch:   "main"
  commit:   "5d90818"
git-subrepo:
  version:  "0.4.6"
  origin:   "https://github.com/ingydotnet/git-subrepo"
  commit:   "110b9eb"
The centralized version of the script is a copy of the existing one with
some backwards-compatible changes¹.

¹ nextstrain/shared@3b69a10...0ac9a4f
@victorlin victorlin force-pushed the victorlin/update-ingest branch from c2f0184 to 81ab6b1 Compare August 8, 2023 15:44
@victorlin victorlin merged commit a7ccb51 into master Aug 8, 2023
@victorlin victorlin deleted the victorlin/update-ingest branch August 8, 2023 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects

Development

Successfully merging this pull request may close these issues.

2 participants