diff --git a/docs/images/introduction/introduction_LC.png b/docs/images/introduction/introduction_LC.png new file mode 100644 index 00000000..49151fad Binary files /dev/null and b/docs/images/introduction/introduction_LC.png differ diff --git a/docs/images/introduction/introduction_MS.png b/docs/images/introduction/introduction_MS.png new file mode 100644 index 00000000..e5fd7f4e Binary files /dev/null and b/docs/images/introduction/introduction_MS.png differ diff --git a/docs/images/introduction/spectrum_peakmap.png b/docs/images/introduction/spectrum_peakmap.png new file mode 100644 index 00000000..6c89ee7f Binary files /dev/null and b/docs/images/introduction/spectrum_peakmap.png differ diff --git a/docs/introduction.md b/docs/introduction.md index 729ad0d5..d22c856c 100644 --- a/docs/introduction.md +++ b/docs/introduction.md @@ -7,21 +7,83 @@ analyses. It offers an infrastructure for rapid development of mass spectrometry related software. OpenMS is free software available under the three clause BSD license and runs under Windows, macOS, and Linux. -OpenMS has a vast variety of pre-built and ready-to-use tools for proteomics -and metabolomics data analysis (TOPPTools) as well as powerful 1D, 2D and 3D -visualization (TOPPView). - -OpenMS offers analyses for various quantitation protocols, including label-free -quantitation, SILAC, iTRAQ, TMT, SRM, SWATH, etc. - -OpenMS provides built-in algorithms for de-novo identification and database search, -as well as adapters to other state-of-the art tools like X!Tandem, Mascot, -OMSSA, etc. It supports easy integration of OpenMS built tools into workflow -engines like KNIME, Galaxy, WS-Pgrade, and TOPPAS via the TOPPtools concept and -a unified parameter handling via a 'common tool description' (CTD) scheme. - -With pyOpenMS, OpenMS offers Python bindings to a large part of the OpenMS API -to enable rapid algorithm development. OpenMS supports the Proteomics Standard -Initiative (PSI) formats for MS data. The main contributors of OpenMS are -currently the Eberhard-Karls-Universität in Tübingen, the Freie Universität -Berlin, and the ETH Zürich. +```{note} +This introduction is aimed at users new to the field of LC-MS data analysis and will introduce some basics terms +and concepts. How to handle the data analysis, available data structures, algorithms and more are covered in the various +subsections of this documentation. +``` + +# Background + +Proteomics and metabolomics are interdisciplinary research fields that study structure, function, and interaction of +proteins and metabolites. They employ large-scale experimental techniques that allow acquiring data at the level of +cellular systems to whole organisms. Mass spectrometry combined with chromatographic separation is commonly used to +identify, characterize or quantify the amount of proteins and metabolites. + +In mass spectrometry-based proteomics and metabolomics, biological samples are extracted, prepared, and separated to +reduce sample complexity. The separated analytes are ionized and measured in the mass spectrometer. Mass and abundance +of ions are stored in mass spectra and used to identify and quantify the analytes in the sample using computational +methods. The quantity and identity of analytes can then be used, for instance, in biomarker discovery, medical diagnostics, +or basic research. + +# Liquid Chromatography(LC) + +LC aims to reduce the complexity of the measured sample by separating analytes based on their physicochemical properties. +Separating analytes in time ensures that a manageable amount of analytes elute at the same time. In mass +spectrometry-based proteomics, (high-pressure) liquid chromatographic separation techniques (HPLC) are methods of choice +to achieve a high degree of separation. In HPLC, peptides are separated on a column. Solved in a pressurized liquid +(mobile phase) they are pumped through a solid adsorbent material (stationary phase) packet into a capillary column. +Physicochemical properties of each peptide determine how strongly it interacts with the stationary phase. The most +common HPLC technique in proteomics and metabolomics uses reversed-phase chromatography (RPC) columns. RPC employs a +hydrophobic stationary phase like octadecyl (C18), a nonpolar carbon chain bonded to a silica base, and a polar mobile +phase. Polar molecules interact weakly with the stationary phase and elute earlier, while non-polar molecules are retained. +Interaction can be further modulated by changing the gradient of solvent concentration in the mobile phase over time. +Elution times in LC are inherently prone to variation, for example, due to fluctuations in the flow rate of the mobile +phase or change of column. Retention time shifts between runs may be compensated using computational chromatographic +retention time alignment methods. In the LC-MS setup, the column is directly coupled to the ion source of the mass +spectrometer. + +![](images/introduction/introduction_LC.png) + +# Mass Spectrometry + +MS is an analytical technique used to determine the mass of molecules. In order to achieve highly accurate and sensitive +mass measurements at the atomic scale, mass spectrometers manipulate charged particles using magnetic and electrostatic +fields. + +![](images/introduction/introduction_MS.png) + +In a typical mass spectrometer, three principal components can be identified: + +- **Ion Source**: A mass spectrometer only handles ions. Thus, charge needs first be transferred to uncharged particles. + The component responsible for the ionization is the ion source. Different types of ion sources and ionization + techniques exist with electrospray ionization (ESI) being currently the most widely used ionization technique for mass + spectrometry-based proteomics. + +- **Mass Analyzer**: Most commonly used mass analyzer in proteomics are time-of-flight (TOF) mass analyzers, quadrupole mass + filters, and orbitrap analyzers. In TOF mass analyzers, the ions are accelerated in an electric field. The flight time + of an ion allows calculating the velocity which in turn is used to calculate the mass-to-charge ratio (m/z). Varying + the electric field allows filtering certain mass-to-charge ratios before they enter the detector. In quadrupole mass + filters, ions pass through an oscillating electric field created by four parallel rods. For a particular voltage, only + ions in a certain mass-to-charge range will reach the detector. The orbitrap is an ion trap mass analyzer (and detector) + that traps ions in orbital motion between a barrel-like outer electrode and a spindle-like central electrode allowing + for prolonged mass measurement. As a result of the prolonged mass measurements, a high mass resolution can be achieved. + +- **Detector**: The last component of the mass spectrometer is the detector. It determines the abundance of ions that + passed through the mass analyzer. Ion intensities (a value that relates to its abundance) and the mass-to-charge ratio + are recorded in a mass spectrum. + +A sample is measured over the retention time of the chromatography typically resulting in tens of thousands of spectra. +The measurement of one sample is called an MS run and the set of spectra called an MS(1) map or peak map. + +![](images/introduction/spectrum_peakmap.png) + +The left image displays spectrum with peaks (m/z and intensity values) and the right image shows spectra stacked in +retention time yielding a peak map. + +In proteomics and metabolomics, the MS1 intensity is often used for the quantification of an analyte. Identification +based on the MS1 mass-to-charge and the isotope pattern is highly ambiguous. To improve identification, tandem mass +spectrometry (MS/MS) can be applied to assess the analyte substructure. To this end, the precursor ion is isolated and +kinetically fragmented using an inert gas (e.g., Argon). Fragments produced by collision-induced fragmentation (CID) are +stored in an MS2 (or MS/MS) spectrum and provide information that helps to resolve the ambiguities in identification. +Alternatively, MS/MS spectra can be used for quantification.