Skip to content

tiny-count: shift parameters for overlap selectors#280

Merged
taimontgomery merged 13 commits intomasterfrom
issue-278
Feb 10, 2023
Merged

tiny-count: shift parameters for overlap selectors#280
taimontgomery merged 13 commits intomasterfrom
issue-278

Conversation

@AlexTate
Copy link
Member

@AlexTate AlexTate commented Feb 6, 2023

Overlap selector definitions have been extended to optionally accept 2 numerical parameters that shift the 5' and 3' ends of matching feature intervals. Alignments are then evaluated using this shifted interval in Stage 2 selection. They have the following syntax:

selector, M, N
  M = shift value for 5' end
  N = shift value for 3' end
  • Positive values shift the specified end in the 3' direction
  • Negative values shift the specified end in the 5' direction
  • If either parameter is provided, the other must also be provided
  • Zero is also an accepted shift value

Shifting might produce an illegal interval. When this happens, the offending match (the feature-rule pair) is omitted from Stage 2 and 3 selection, but the feature's other matches remain. Illegal intervals include:

  • Zero-length intervals (start == end)
  • Inverted intervals (start > end)
  • Negative intervals (start < 0)

Closes #278

…eter order isn't switched for 3' anchored, and honestly I'm having second thoughts on whether it makes sense to break the pattern for that selector
…ted overlap selectors. Rather than handling the shift param entirely within the IntervalSelector base class, I instead provide a static method for returning a shifted HTSeq.GenomicInterval. This has two advantages:

- Maintains IntervalSelectors' small memory footprint since this approach doesn't require any new attributes for the class. These objects can be very numerous depending on the GFF and ruleset
- Keeps the code clean for adding matches to the correct intervals in ReferenceTables. Match tuples need to be inserted in the GenomicArray under the same interval that the overlap selector is expecting.

Due to the above, cached selectors in build_interval_selectors() have to be added under an updated key that includes the shifted interval as well.

Additionally, illegal shift operations (interval goes to zero length or inverted) are handled. Match tuples holding these overlap selectors are simply dropped.
…ased its complexity. It's a little abstract but hopefully this keeps the code approachable. Also improved explanatory docstrings for these methods.
…ive values in the 5' direction and positive values in the 3' direction regardless of strand
# Conflicts:
#	tiny/rna/counter/features.py
…rval_selectors, and to presort match tuples as ReferenceFeatures does for faster Stage 2 selection
…rvals with negative start positions aren't supported by ChromVectors in HTSeq's GenomicArrays. Also correcting the definition of a null interval (oops)
…skips the countdown for each test case since we don't really need to wait while running tests
@AlexTate AlexTate marked this pull request as ready for review February 10, 2023 02:41
@taimontgomery
Copy link
Collaborator

Tested with zswim8 and ram1 data. Needs more extensive testing once the next round of updates to feature selections are implemented.

@taimontgomery taimontgomery merged commit 47b5def into master Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tiny-count: overlap selector shift parameters

2 participants