POC: Devon
Story
MSstatsBig is used to process large datasets out of memory. As of now, it supports converters from Fragpipe and Spectronaut, two peptide identification/quantification tools. We initially started with these two tools because they support data independent acquisition (DIA), which is a type of mass spec proteomics technique that can capture thousands of proteins and hundreds of fragment ions per protein (hence big data).
There is one more tool called DIANN that performs peptide identification/quantification for data independent acquisition. While we have a converter that performs ETL for DIANN reports in MSstatsConvert here, we do not have a corresponding big converter for DIANN in MSstatsBig. We need to create a big dataset converter for DIANN.
Subtasks
- Review code and set up a meeting with Devon to summarize the ETL workflow and ask questions
- Review examples of the function DIANNtoMSstatsFormat here. Use this example dataset too. Use this other example dataset too.
- Review bigFragPipetoMSstatsFormat vs FragPipetoMSstatsFormat code to understand how the processing differs between the two functions
- Review bigSpectronauttoMSstatsFormat and SpectronauttoMSstatsFormat code to understand how the processing differs between the two functions2. Implement a basic bigDIANNtoMSstatsFormat converter. For MVP, use the same parameters as bigFragPipetoMSstatsFormat. It should have two main steps:
- Cleaning the DIANN data - see here for what columns are important in DIANN.
- Reusing the
MSstatsPreprocessBig function
- Write unit tests
Acceptance Criteria
PR for MVP of bigDIANNtoMSstatsFormat converter and unit tests has been pushed to devel branch
POC: Devon
Story
MSstatsBig is used to process large datasets out of memory. As of now, it supports converters from Fragpipe and Spectronaut, two peptide identification/quantification tools. We initially started with these two tools because they support data independent acquisition (DIA), which is a type of mass spec proteomics technique that can capture thousands of proteins and hundreds of fragment ions per protein (hence big data).
There is one more tool called DIANN that performs peptide identification/quantification for data independent acquisition. While we have a converter that performs ETL for DIANN reports in MSstatsConvert here, we do not have a corresponding big converter for DIANN in MSstatsBig. We need to create a big dataset converter for DIANN.
Subtasks
MSstatsPreprocessBigfunctionAcceptance Criteria
PR for MVP of bigDIANNtoMSstatsFormat converter and unit tests has been pushed to devel branch