Pipeline: tagged counting repurposed as classifier by AlexTate · Pull Request #241 · MontgomeryLab/tinyRNA

AlexTate · 2022-10-18T23:26:46Z

The Tag column has been renamed to Classify as... and will be used to apply a user-defined class to features that match the rule. The Class= attribute is no longer used to determine a feature's class. Tagged counting semantics still apply.

The counts table produced by tiny-count therefore now has a multiindex of (Feature ID, Classifier). Backward compatibility is not offered for counts tables produced by an earlier version of tinyRNA. The Features Sheet is checked for the presence of a Tag column at pipeline/tiny-count startup and, if present, an error is produced along with steps to fix it.

These changes opened the door for some very satisfying improvements to the code quality in plotter.py. Two additional parameters have been added to the pipeline/tiny-plot:

--unassigned-class: the label to use for unassigned counts in class_charts
--unknown-class: the label to use for counts assigned by rules lacking a Classify as... value. This is used in class_charts and scatter_dge_class.

Closes #240

Removing the accounting of the Class= attribute from ReferenceTables, usages of this output in the Features data class, and the FeatureCounts output file class. Also correcting some erroneous use of the StepVector typehint (this should have been GenomicArray). The Feature Class column has been removed from the output feature_counts.csv table, and the Tag column has been renamed to "Classifier"

…table format. The Feature Class column has been dropped and the Tag column has been renamed to Classifier. Also further improved code flexibility for tiny-deseq.r by not hardcoding character columns when calling write.csv

…Tag multiindex column as a classifier. Wow, this opened the door for some really satisfying simplifications to the class-related codebase. Backward compatibility is not offered with this commit. I've been thinking about how to reconcile this but I think ultimately that will be a bad idea. Number one, the counting semantics of the old Feature Class vs Tag columns are completely different; these inputs wouldn't be interchangeable. I've also removed the show_unknown option from scatter_dges() because we have yet to use this feature and we've never added the option to any user-facing config files

…consistency.

--unassigned-class-label and --unknown-class-label. These options retain their previous default values of _UNASSIGNED_ and _UNKNOWN_. Class labels in class_charts are now sorted.

…ed from the file requirements table. Description of the feature_counts.csv output has also been updated

…lassified" counting approach.

…. The count normalization section has been removed because it no longer applies with the new counting method. A link to the corresponding parameters has been added to the _UNKNOWN_ and _UNASSIGNED_ sections.

…and Run Configs

…ts with a Tag column. Previously, the user would have been notified that the "Classify as..." column was missing from their Features Sheet, which isn't quite as helpful.

AlexTate · 2022-10-19T19:31:43Z

Since this PR introduces changes that are backward incompatible, I would like to make a release for the project in its current state before this one is merged.

…eferenceTables.get(). The get() function is significantly shorter after the recent changes for GFF validation, so it can accommodate the finalization routine without becoming too crowded.

# Conflicts: # README.md # tests/unit_tests_counter.py

taimontgomery · 2022-10-27T19:10:54Z

With this new, much improved approach to classification, won't the class and rule plots always be the same? And thus can we get rid of the rules plots? Perhaps also change counts_by_rule.csv to counts_by_classification.csv, changing the Rule String column to Classification?

AlexTate · 2022-10-27T19:53:35Z

No, class and rule plots will differ if any rules share a Classify as... value. Rule plots can be used in this case to see how much each rule contributed to the pooled classes. For this reason I think the proposed changes to output files would be incorrect

taimontgomery · 2022-10-27T20:04:02Z

I see. In that case, perhaps we can add a counts_by_classification.csv table at some point.

taimontgomery · 2022-10-27T20:04:18Z

Tested successfully with ram1 data.

AlexTate added 15 commits October 15, 2022 17:32

Changing the internal and external names of the Tag column

e77c243

Renaming the "tags" attribute in the Features class to "classes" for …

eef300a

…consistency.

Adding two new command line options to tiny-plot:

138ad24

--unassigned-class-label and --unknown-class-label. These options retain their previous default values of _UNASSIGNED_ and _UNKNOWN_. Class labels in class_charts are now sorted.

Including new parameters for labelling unknown/unassigned classes

e0748a0

Updated the Features Sheet example for the new column configuration

2f4da09

Description of class counting via the Class= attribute has been remov…

1c142dc

…ed from the file requirements table. Description of the feature_counts.csv output has also been updated

The Tagged Counting section has been rewritten to describe the new "c…

34e51fd

…lassified" counting approach.

Removed references to Class= counting in the class_charts description…

ee7ca26

…. The count normalization section has been removed because it no longer applies with the new counting method. A link to the corresponding parameters has been added to the _UNKNOWN_ and _UNASSIGNED_ sections.

Adding new parameters for unassigned/unknown class labels to the CWL …

d240b02

…and Run Configs

Updating column config in the features.csv template

436b875

Added a brief check in the CSV reader to catch usage of Features Shee…

1d95e19

…ts with a Tag column. Previously, the user would have been notified that the "Classify as..." column was missing from their Features Sheet, which isn't quite as helpful.

Merge branch 'master' into issue-240

79cdfe5

AlexTate marked this pull request as draft October 19, 2022 19:29

AlexTate mentioned this pull request Oct 21, 2022

Configuration: Samples Sheet validation #243

Merged

AlexTate added 3 commits October 21, 2022 17:32

Unit tests have been updated for the new class/tagged counting approach

fe49acb

Slight refactor to move ReferenceTables.finalize_tables() back into R…

a2fd955

…eferenceTables.get(). The get() function is significantly shorter after the recent changes for GFF validation, so it can accommodate the finalization routine without becoming too crowded.

Merge branch 'master' into issue-240

6894cc1

# Conflicts: # README.md # tests/unit_tests_counter.py

AlexTate marked this pull request as ready for review October 22, 2022 21:20

taimontgomery merged commit 139ebc1 into master Oct 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline: tagged counting repurposed as classifier#241

Pipeline: tagged counting repurposed as classifier#241
taimontgomery merged 18 commits intomasterfrom
issue-240

AlexTate commented Oct 18, 2022 •

edited

Loading

Uh oh!

AlexTate commented Oct 19, 2022

Uh oh!

taimontgomery commented Oct 27, 2022

Uh oh!

AlexTate commented Oct 27, 2022

Uh oh!

taimontgomery commented Oct 27, 2022

Uh oh!

taimontgomery commented Oct 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlexTate commented Oct 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexTate commented Oct 19, 2022

Uh oh!

taimontgomery commented Oct 27, 2022

Uh oh!

AlexTate commented Oct 27, 2022

Uh oh!

taimontgomery commented Oct 27, 2022

Uh oh!

taimontgomery commented Oct 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlexTate commented Oct 18, 2022 •

edited

Loading