Pipeline: new location and scope for GFF Source Filters and Column Filters#246
Merged
taimontgomery merged 14 commits intomasterfrom Nov 3, 2022
Merged
Pipeline: new location and scope for GFF Source Filters and Column Filters#246taimontgomery merged 14 commits intomasterfrom
taimontgomery merged 14 commits intomasterfrom
Conversation
…uild_selectors() has been updated to build them, and has been modified to allow for building partial rules tables (or building only the selectors contained in the table)
…er. Also added a routine to catch Features Sheets lacking these columns, which produces a helpful error message rather than a generic "missing column" error that validate_csv_header would otherwise provide
…e basis. These selectors are now part of Stage 1 selection. Updated GFFValidator to use the new Source/Type filters. It now accepts a rules table (with at least "Filter_s", "Filter_t" defined), then uses FeatureSelector to build them.
…n class. Only Source/Type filters are retained. The result is a "rules table" with a single rule containing all filters defined in the Features Sheet. This is then passed to GFFValidator so that it can use these filters to screen features before validation.
…t can filter features by source/type before evaluating them. Also simplified load_config() a little bit.
…r issues and proofread)
# Conflicts: # START_HERE/features.csv # doc/Configuration.md # doc/Parameters.md # doc/tiny-count.md # tiny/rna/configuration.py # tiny/rna/counter/counter.py # tiny/rna/counter/hts_parsing.py # tiny/templates/features.csv
…he most likely case in the conditional chain
…ny-count-related module tests so that identities are wildcard. I think this is an improvement but there are still a lot of opportunities for cleaning up these tests.
Member
Author
|
Merge conflicts have been resolved and unit tests have been updated. issue-244 was derived from issue-234, so this PR contains all of the changes waiting to be merged under the other PR. Once #245 is merged, I'll need to toggle the base branch on this PR in order to update the Files Changed. Otherwise it will permanently show the other PR's changes under this one which is bad documentation. |
…ncompatible Features Sheets containing an "Alias by..." or "Feature Source" column. A helpful error message is produced regarding the changes.
Collaborator
|
Tested successfully with ram1 and Lib303-314 data. Caught validation.py issue with fasta.gz files but the issue was related to a previous pull request. Emailed issue to Alex. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Config File Changes:
The
counter_source_filterandcounter_type_filterhave been moved out of the Run Config and into the Features Sheet. This means thatSource/Type Filtersare specific to their rule/row. They have been absorbed into Stage 1 selection, so we now say the targets of Stage 1 selection are GFF columns 2, 3, and 9.Command line argument changes:
The
--source-filterand--type-filterarguments have been removed from tiny-count.Codebase improvements:
If the Features Sheet contains column headers from a previous version of tinyRNA, a helpful error message is produced that lets the user know what changed, where to read about it, and what to do to make the error go away. This is better than the generic missing/unknown column header that would otherwise be produced. This is true in both pipeline and standalone runs.
Misc. changes:
GFFValidator applies the Source and Type Filters before validation, as it had before, but rather than using a fully-built rule table for evaluation, it instead uses a minimal ruleset containing only the filters. This was a performance-oriented change. The ruleset is a single rule containing all specified terms for both filters regardless of the number of rules at input.
Closes #244