prepare_transcripts: build pyfasta .gdx/.flat sidecars eagerly so downstream tasks don't write to staged inputs

### Summary

`prepare_transcripts` writes `annotation/transcripts_sequence.fa` but doesn't build its pyfasta index. Instead, downstream RiboCode steps (`RiboCode`, `RiboCode_onestep`) trigger pyfasta's lazy index build on first read of the FASTA - which writes `transcripts_sequence.fa.gdx` and `transcripts_sequence.fa.flat` *next to the FASTA*.

That breaks any deployment where the annotation directory is shared, read-only, or otherwise not the consumer's own writable working directory:

- read-only `/mnt` or NFS mounts on shared HPC infrastructure
- container bind mounts published `:ro`
- workflow engines that stage the annotation as a symlink into each consumer task (writes follow the symlink back to the producer, parallel consumers then race)

### Suggested fix

Add one line at the end of `processTranscripts(...)` in `prepare_transcripts.py` so the indexes are built once in the producing call, before the annotation is published:

```python
# Eagerly build pyfasta indexes so downstream readers don't write to the
# (possibly read-only) staged annotation directory.
GenomeSeq(os.path.join(out_dir, "transcripts_sequence.fa"))
```

`GenomeSeq.__init__` already calls `Fasta(filename, key_fn=get_chrom)` with the same key function downstream code uses, so the `.gdx`/`.flat` produced are byte-identical to what would otherwise be written lazily at first downstream read.

Optionally gate behind `--prebuild-indexes` if you'd rather keep current behaviour by default.

Happy to send a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prepare_transcripts: build pyfasta .gdx/.flat sidecars eagerly so downstream tasks don't write to staged inputs #70

Summary

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

prepare_transcripts: build pyfasta .gdx/.flat sidecars eagerly so downstream tasks don't write to staged inputs #70

Description

Summary

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions