Skip to content

Pipeline: new location and scope for GFF Source Filters and Column Filters#246

Merged
taimontgomery merged 14 commits intomasterfrom
issue-244
Nov 3, 2022
Merged

Pipeline: new location and scope for GFF Source Filters and Column Filters#246
taimontgomery merged 14 commits intomasterfrom
issue-244

Conversation

@AlexTate
Copy link
Member

@AlexTate AlexTate commented Oct 27, 2022

Config File Changes:
The counter_source_filter and counter_type_filter have been moved out of the Run Config and into the Features Sheet. This means that Source/Type Filters are specific to their rule/row. They have been absorbed into Stage 1 selection, so we now say the targets of Stage 1 selection are GFF columns 2, 3, and 9.

Command line argument changes:
The --source-filter and --type-filter arguments have been removed from tiny-count.

Codebase improvements:
If the Features Sheet contains column headers from a previous version of tinyRNA, a helpful error message is produced that lets the user know what changed, where to read about it, and what to do to make the error go away. This is better than the generic missing/unknown column header that would otherwise be produced. This is true in both pipeline and standalone runs.

Misc. changes:
GFFValidator applies the Source and Type Filters before validation, as it had before, but rather than using a fully-built rule table for evaluation, it instead uses a minimal ruleset containing only the filters. This was a performance-oriented change. The ruleset is a single rule containing all specified terms for both filters regardless of the number of rules at input.

Closes #244

…uild_selectors() has been updated to build them, and has been modified to allow for building partial rules tables (or building only the selectors contained in the table)
…er. Also added a routine to catch Features Sheets lacking these columns, which produces a helpful error message rather than a generic "missing column" error that validate_csv_header would otherwise provide
…e basis. These selectors are now part of Stage 1 selection.

Updated GFFValidator to use the new Source/Type filters. It now accepts a rules table (with at least "Filter_s", "Filter_t" defined), then uses FeatureSelector to build them.
…n class. Only Source/Type filters are retained. The result is a "rules table" with a single rule containing all filters defined in the Features Sheet. This is then passed to GFFValidator so that it can use these filters to screen features before validation.
…t can filter features by source/type before evaluating them. Also simplified load_config() a little bit.
# Conflicts:
#	START_HERE/features.csv
#	doc/Configuration.md
#	doc/Parameters.md
#	doc/tiny-count.md
#	tiny/rna/configuration.py
#	tiny/rna/counter/counter.py
#	tiny/rna/counter/hts_parsing.py
#	tiny/templates/features.csv
…he most likely case in the conditional chain
…ny-count-related module tests so that identities are wildcard. I think this is an improvement but there are still a lot of opportunities for cleaning up these tests.
@AlexTate AlexTate marked this pull request as ready for review October 28, 2022 20:56
@AlexTate
Copy link
Member Author

Merge conflicts have been resolved and unit tests have been updated.

issue-244 was derived from issue-234, so this PR contains all of the changes waiting to be merged under the other PR. Once #245 is merged, I'll need to toggle the base branch on this PR in order to update the Files Changed. Otherwise it will permanently show the other PR's changes under this one which is bad documentation.

…ncompatible Features Sheets containing an "Alias by..." or "Feature Source" column. A helpful error message is produced regarding the changes.
@AlexTate AlexTate changed the base branch from master to issue-3 November 2, 2022 18:49
@AlexTate AlexTate changed the base branch from issue-3 to master November 2, 2022 18:50
@taimontgomery
Copy link
Collaborator

Tested successfully with ram1 and Lib303-314 data. Caught validation.py issue with fasta.gz files but the issue was related to a previous pull request. Emailed issue to Alex.

@taimontgomery taimontgomery merged commit 49ddf86 into master Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pipeline: new location and scope of Source Filter and Type Filter

2 participants