Skip to content

Create MSstatsBig DIANN Converter - Old Format #7

@tonywu1999

Description

@tonywu1999

POC: Devon

Story

MSstatsBig is used to process large datasets out of memory. As of now, it supports converters from Fragpipe and Spectronaut, two peptide identification/quantification tools. We initially started with these two tools because they support data independent acquisition (DIA), which is a type of mass spec proteomics technique that can capture thousands of proteins and hundreds of fragment ions per protein (hence big data).

There is one more tool called DIANN that performs peptide identification/quantification for data independent acquisition. While we have a converter that performs ETL for DIANN reports in MSstatsConvert here, we do not have a corresponding big converter for DIANN in MSstatsBig. We need to create a big dataset converter for DIANN.

Subtasks

  1. Review code and set up a meeting with Devon to summarize the ETL workflow and ask questions
    1. Review examples of the function DIANNtoMSstatsFormat here. Use this example dataset too. Use this other example dataset too.
    2. Review bigFragPipetoMSstatsFormat vs FragPipetoMSstatsFormat code to understand how the processing differs between the two functions
    3. Review bigSpectronauttoMSstatsFormat and SpectronauttoMSstatsFormat code to understand how the processing differs between the two functions2. Implement a basic bigDIANNtoMSstatsFormat converter. For MVP, use the same parameters as bigFragPipetoMSstatsFormat. It should have two main steps:
    4. Cleaning the DIANN data - see here for what columns are important in DIANN.
    5. Reusing the MSstatsPreprocessBig function
  2. Write unit tests

Acceptance Criteria

PR for MVP of bigDIANNtoMSstatsFormat converter and unit tests has been pushed to devel branch

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions