Summary
prepare_transcripts writes annotation/transcripts_sequence.fa but doesn't build its pyfasta index. Instead, downstream RiboCode steps (RiboCode, RiboCode_onestep) trigger pyfasta's lazy index build on first read of the FASTA - which writes transcripts_sequence.fa.gdx and transcripts_sequence.fa.flat next to the FASTA.
That breaks any deployment where the annotation directory is shared, read-only, or otherwise not the consumer's own writable working directory:
- read-only
/mnt or NFS mounts on shared HPC infrastructure
- container bind mounts published
:ro
- workflow engines that stage the annotation as a symlink into each consumer task (writes follow the symlink back to the producer, parallel consumers then race)
Suggested fix
Add one line at the end of processTranscripts(...) in prepare_transcripts.py so the indexes are built once in the producing call, before the annotation is published:
# Eagerly build pyfasta indexes so downstream readers don't write to the
# (possibly read-only) staged annotation directory.
GenomeSeq(os.path.join(out_dir, "transcripts_sequence.fa"))
GenomeSeq.__init__ already calls Fasta(filename, key_fn=get_chrom) with the same key function downstream code uses, so the .gdx/.flat produced are byte-identical to what would otherwise be written lazily at first downstream read.
Optionally gate behind --prebuild-indexes if you'd rather keep current behaviour by default.
Happy to send a PR.
Summary
prepare_transcriptswritesannotation/transcripts_sequence.fabut doesn't build its pyfasta index. Instead, downstream RiboCode steps (RiboCode,RiboCode_onestep) trigger pyfasta's lazy index build on first read of the FASTA - which writestranscripts_sequence.fa.gdxandtranscripts_sequence.fa.flatnext to the FASTA.That breaks any deployment where the annotation directory is shared, read-only, or otherwise not the consumer's own writable working directory:
/mntor NFS mounts on shared HPC infrastructure:roSuggested fix
Add one line at the end of
processTranscripts(...)inprepare_transcripts.pyso the indexes are built once in the producing call, before the annotation is published:GenomeSeq.__init__already callsFasta(filename, key_fn=get_chrom)with the same key function downstream code uses, so the.gdx/.flatproduced are byte-identical to what would otherwise be written lazily at first downstream read.Optionally gate behind
--prebuild-indexesif you'd rather keep current behaviour by default.Happy to send a PR.