-
Notifications
You must be signed in to change notification settings - Fork 19
Tutorial Preface
Chapter 4 of this Wiki is a tutorial for using DeepSVR. Its purpose is to detail the necessary commands for building a model with a large data set, using the model to classify a set of input variants, assessing the accuracy of the model, and retraining the model. This tutorial is designed to be interactively followed. The chapter is structured such that each subchapter explains a critical command for building and using DeepSVR. Within each subchapter is background on a command and its function, as well as detailed explanations of the input/output files, command line syntax, and command flag options.
The original 41,000 variants used to train the model in this tutorial are referred to as the Original Data, while the variants to be classified from Chr22 of tst1 are referred to as the Inference Data.
We recommend creating and training the classifier with the Original data, . The Original data set is an excellent starting point for training a model, con given its sample size, tissue diversity, diversity, and We strongly recommend creating the classifier with the Original 41,000 somatic variants because it is a large and diverse data set that equips the model with capabilities to handle many cancer types across solid and liquid tumors. We also recommend adding 5% (or >250) of your own manually reviewed variants to the training data in order to help mitigate batch effects. Upon reviewing the accuracy of the called variants via ROC curves, keep in mind that accuracy can be improved by increasing the amount of training data that you contribute.
For subchapters 4.3-4.6, the main table has purple text boxes illustrating the current step.
To build the classifier using the Original Data and classify your own data, replace the tst1 Inference Data with your somatic variants. This tutorial is designed to facilitate a novice programmer using a deep learning model to classify their own putative somatic variants.
Chapter 1 - Background Information:
Authors | Citation | About | Repository Installation
Chapter 2 - Identification of Somatic Variants in Sequencing Data:
Automated Somatic Variant Calling | Somatic Variant Refinement (SVR)
Chapter 3 - Methods and Analysis for Machine Learning Models:
Data Assembly | Logistic Regression Model | Random Forest Model | Deep Learning Model | Model Evaluation | Inter-reviewer Variability | Orthogonal Validation | Manual Review Validation | Re-review Analysis
Chapter 4 - DeepSVR Tutorial:
Tutorial Preface | DeepSVR Installation | Create the Classifier | Prepare Data | Classify Data | Re-Train Model
Chapter 5 - Usage Documents:
DeepSVR Installation | Usage Documents