Tmt converters#127
Conversation
📝 WalkthroughWalkthroughAdds five new MSstatsTMT converter functions (MaxQuant, OpenMS, Proteome Discoverer, Philosopher, SpectroMine), their documentation and tests, updates Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Converter as "Converter Function"
participant Logs as "MSstatsLogsSettings"
participant Import as "MSstatsImport"
participant Clean as "MSstatsClean"
participant Annot as "MSstatsMakeAnnotation"
participant Pre as "MSstatsPreprocess"
participant Balance as "MSstatsBalancedDesign"
participant Output
Client->>Converter: call converter(input, annotation, options)
Converter->>Logs: initialize logging
Converter->>Import: import source-format data
Import-->>Converter: raw imported data
Converter->>Clean: clean/filter imported data
Clean-->>Converter: cleaned data
Converter->>Annot: build/resolve annotation
Annot-->>Converter: resolved annotation
Converter->>Pre: apply preprocessing & score filters
Pre-->>Converter: preprocessed features
Converter->>Balance: balance experimental design
Balance-->>Converter: balanced dataset
Converter->>Converter: rename PrecursorCharge -> Charge
Converter->>Output: return MSstatsTMT data.frame
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
🧹 Nitpick comments (4)
R/converters_PhilosophertoMSstatsTMTFormat.R (2)
107-121: Dead code:.getPhilosopherInputhelper is defined but never used.The helper function
.getPhilosopherInputis defined and documented but the mainPhilosophertoMSstatsTMTFormatfunction at line 47 directly createslist(Mixture1=data.table(input))instead of calling this helper. Either:
- Remove the unused helper, or
- Refactor the main function to use it if multi-mixture support was intended.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/converters_PhilosophertoMSstatsTMTFormat.R` around lines 107 - 121, The helper .getPhilosopherInput is defined but never used; update PhilosopherToMSstatsTMTFormat to call .getPhilosopherInput(input, path, folder) instead of hardcoding list(Mixture1 = data.table(input)) so multi‑mixture inputs from files are supported, then iterate over the returned mixture_files list (named Mixture1..N) to read/convert each element into data.tables as the original code intended; ensure variable names and expected structure (names(mixture_files)) are preserved and remove the hardcoded list construction.
80-80: Minor inconsistency: missingMSstatsConvert::namespace prefix.Line 80 calls
MSstatsPreprocess(...)without theMSstatsConvert::prefix, while other core function calls in this file (lines 41, 49, 54, 91) use the fully qualified namespace. This won't cause issues within the package but is inconsistent.♻️ Suggested fix for consistency
- input = MSstatsPreprocess( + input = MSstatsConvert::MSstatsPreprocess(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/converters_PhilosophertoMSstatsTMTFormat.R` at line 80, The call to MSstatsPreprocess is inconsistent with other fully-qualified calls; update the invocation of MSstatsPreprocess in the converters_PhilosophertoMSstatsTMTFormat.R file to use the MSstatsConvert:: namespace (i.e., replace MSstatsPreprocess(...) with MSstatsConvert::MSstatsPreprocess(...)) so it matches other calls like MSstatsConvert::functionName used elsewhere in the file.R/converters_SpectroMinetoMSstatsTMTFormat.R (1)
28-28: Minor API inconsistency: parameter namedsummaryforMultipleRowsvssummaryforMultiplePSMs.This converter uses
summaryforMultipleRowswhileOpenMStoMSstatsTMTFormatusessummaryforMultiplePSMs. Consider aligning parameter names across converters for a consistent user-facing API.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/converters_SpectroMinetoMSstatsTMTFormat.R` at line 28, The parameter name in the SpectroMinetoMSstatsTMTFormat converter is inconsistent: rename the function argument summaryforMultipleRows to summaryforMultiplePSMs (or vice versa to match the other converter) and update all internal references in the function body, default value (rmProtein_with1Feature = FALSE, summaryforMultiplePSMs = sum), any roxygen/documentation, examples, and tests to use the new name so the public API matches OpenMStoMSstatsTMTFormat; ensure any calls inside this file that referenced summaryforMultipleRows are updated to the new identifier to avoid undefined symbol errors.inst/tinytest/test_converters_MaxQtoMSstatsTMTFormat.R (1)
29-33: Intensity comparison logic may produce false positives.Converting both input and output intensities to character for comparison could mask precision issues or type mismatches. More importantly, the assertion
all(output_intensities %in% input_intensities)only checks that output values exist in input, but doesn't verify the values are correctly mapped to the right rows/features.Consider also testing a specific known intensity value mapping if the fixture data is deterministic, or at minimum add a comment explaining the test's intent and limitations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@inst/tinytest/test_converters_MaxQtoMSstatsTMTFormat.R` around lines 29 - 33, The test currently coerces intensities to character and only checks set membership (input_intensities, output_intensities), which can hide precision/type issues and doesn't validate correct row-wise mapping; change the comparison to numeric (use as.numeric on mq_ev[, intensity_cols, ...] and on output$Intensity), compare values with a numeric tolerance (e.g., all.equal or abs diff < tol) and, if fixture data is deterministic, assert at least one specific known mapping (e.g., that output$Intensity for a given feature/sample equals the corresponding mq_ev value) instead of only using %in%; update the variables intensity_cols, input_intensities, output_intensities and replace expect_true(all(... %in% ...)) with numeric equality checks and an explicit mapping assertion or add a comment documenting test limitations if you cannot assert row-wise equality.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@inst/tinytest/test_converters_PDtoMSstatsTMTFormat.R`:
- Around line 26-27: The failing tests call expect_error for
PDtoMSstatsTMTFormat but don’t disable logging side effects; update the
expect_error invocations that call PDtoMSstatsTMTFormat (the ones removing
"Protein.Accessions" and the other similar case) to pass use_log_file = FALSE so
the call becomes expect_error(PDtoMSstatsTMTFormat(..., use_log_file = FALSE),
...) — this ensures errors are asserted purely on schema/validation and not on
filesystem/logging side effects.
In `@man/dot-getPhilosopherInput.Rd`:
- Around line 9-11: The Rd docs are missing parameter entries for path and
folder even though the function (in R/converters_PhilosophertoMSstatsTMTFormat.R
— the function that produces the Philosopher input handling, e.g.,
getPhilosopherInput or dot-getPhilosopherInput wrapper) accepts three args
(input, path, folder); add roxygen `@param` tags for path and folder in that
function's roxygen block in R/converters_PhilosophertoMSstatsTMTFormat.R
describing their expected types and roles, then regenerate the Rd files (run
devtools::document() or roxygen2) so man/dot-getPhilosopherInput.Rd includes
`@param` path and `@param` folder entries.
In `@man/MaxQtoMSstatsTMTFormat.Rd`:
- Line 40: Fix the typos in the Rd argument descriptions: change "Defaut" to
"Default" in the rmProtein_with1Feature description and change "wil" to "will"
where it appears (the second argument description). Locate and edit the rd
documentation entries for rmProtein_with1Feature so the user-facing text reads
correctly (e.g., "Default is FALSE" and "will remove the proteins...").
- Line 16: The roxygen documentation for the parameter summaryforMultipleRows is
inconsistent (one place shows sum, another shows max); update the roxygen
comments so both occurrences document the same default (preferably change the
earlier mention to "max (default)" to match the other instance) — locate the
`@param` or inline roxygen entries referencing summaryforMultipleRows and make
their default text identical.
In `@man/OpenMStoMSstatsTMTFormat.Rd`:
- Line 28: Fix the typos in the roxygen parameter documentation for the
OpenMStoMSstatsTMTFormat function: update the param description for
rmProtein_with1Feature to read "TRUE will remove the proteins which have only 1
peptide and charge. Default is FALSE." (replace "Defaut" → "Default" and "wil" →
"will"), and make the same correction for the second occurrence of the parameter
documentation referenced in the review; after editing the roxygen comments in
the function source (OpenMStoMSstatsTMTFormat roxygen block), regenerate the Rd
files with roxygen2 (e.g., roxygenise()) so man/OpenMStoMSstatsTMTFormat.Rd is
updated.
In `@man/PDtoMSstatsTMTFormat.Rd`:
- Line 31: Fix the typos in the roxygen/doc comments for the
PDtoMSstatsTMTFormat documentation: correct "TURE" to "TRUE", "Defaut" to
"Default", and "wil" to "will" (and any similar misspellings) for the arguments
including useNumProteinsColumn and the other argument descriptions referenced
around the same block (lines showing "useNumProteinsColumn" and the entries at
the nearby positions). Regenerate the man page after updating the roxygen
comments so the Rd output reflects the corrected, user-facing text.
- Line 15: The roxygen docs for the parameter summaryforMultipleRows are
inconsistent (one occurrence lists default as sum, another as max); open the
roxygen source that documents summaryforMultipleRows and align the default
description with the actual function default (inspect the function signature
where summaryforMultipleRows is defined), then update both occurrences in the
roxygen block so they match the real default (e.g., change "sum" to "max" or
vice versa) and rebuild the Rd; ensure the parameter name summaryforMultipleRows
is used exactly as in the function declaration.
In `@man/PhilosophertoMSstatsTMTFormat.Rd`:
- Line 18: The documented default for the parameter summaryforMultipleRows is
inconsistent between the roxygen source and the generated man page; update the
roxygen comment where summaryforMultipleRows is documented (the roxygen block
that generates PhilosopherToMSstatsTMTFormat.Rd) so the `@param/default`
description matches the actual default used in the function (i.e., change either
the code default symbol or the roxygen text so both state the same behavior —
e.g., if the function default is sum, update the roxygen line that currently
says "max(default)" to "sum(default)", or vice versa), and then regenerate the
.Rd so PhilosopherToMSstatsTMTFormat.Rd shows the corrected default
consistently.
- Line 51: Fix typos in the roxygen/Rd argument descriptions: change "Defaut" to
"Default" in the description for rmProtein_with1Feature and change "wil" to
"will" in the other argument description that contains the "wil" typo; update
the roxygen comment lines so regenerated docs show "Default is FALSE" and "will"
correctly in the argument texts.
In `@man/SpectroMinetoMSstatsTMTFormat.Rd`:
- Line 39: Update the roxygen `@param` for summaryforMultipleRows in
R/converters_SpectroMinetoMSstatsTMTFormat.R to state "sum(default) or max"
(instead of "max(default) or sum"), i.e. change the `@param`
summaryforMultipleRows description to reflect that the function default is sum,
then regenerate the Rd files so man/SpectroMinetoMSstatsTMTFormat.Rd is updated
accordingly.
In `@R/converters_MaxQtoMSstatsTMTFormat.R`:
- Around line 1-11: Fix the typo in the roxygen parameter documentation for
rmProtein_with1Feature: change "Defaut is FALSE." to "Default is FALSE." in the
comment block of converters_MaxQtoMSstatsTMTFormat.R where the parameter
rmProtein_with1Feature is described so the generated help text is correct.
In `@R/converters_PDtoMSstatsTMTFormat.R`:
- Around line 7-9: Fix the typos in the Roxygen parameter documentation for the
function in converters_PDtoMSstatsTMTFormat.R: change "TURE" to "TRUE" in the
description for useNumProteinsColumn, and change "Defaut" to "Default" in the
description for rmProtein_with1Feature; ensure the parameter lines for
useNumProteinsColumn, rmPSM_withfewMea_withinRun, and rmProtein_with1Feature
read clearly and use correct spelling/capitalization for TRUE/Default.
- Around line 47-56: The code defines feature_columns but then calls
MSstatsPreprocess with a hardcoded vector; update the MSstatsPreprocess call to
pass the feature_columns variable instead of c("PeptideSequence",
"PrecursorCharge") so the function uses the defined variable (look for the
MSstatsPreprocess(...) invocation and replace the literal feature_columns
argument with the feature_columns symbol).
---
Nitpick comments:
In `@inst/tinytest/test_converters_MaxQtoMSstatsTMTFormat.R`:
- Around line 29-33: The test currently coerces intensities to character and
only checks set membership (input_intensities, output_intensities), which can
hide precision/type issues and doesn't validate correct row-wise mapping; change
the comparison to numeric (use as.numeric on mq_ev[, intensity_cols, ...] and on
output$Intensity), compare values with a numeric tolerance (e.g., all.equal or
abs diff < tol) and, if fixture data is deterministic, assert at least one
specific known mapping (e.g., that output$Intensity for a given feature/sample
equals the corresponding mq_ev value) instead of only using %in%; update the
variables intensity_cols, input_intensities, output_intensities and replace
expect_true(all(... %in% ...)) with numeric equality checks and an explicit
mapping assertion or add a comment documenting test limitations if you cannot
assert row-wise equality.
In `@R/converters_PhilosophertoMSstatsTMTFormat.R`:
- Around line 107-121: The helper .getPhilosopherInput is defined but never
used; update PhilosopherToMSstatsTMTFormat to call .getPhilosopherInput(input,
path, folder) instead of hardcoding list(Mixture1 = data.table(input)) so
multi‑mixture inputs from files are supported, then iterate over the returned
mixture_files list (named Mixture1..N) to read/convert each element into
data.tables as the original code intended; ensure variable names and expected
structure (names(mixture_files)) are preserved and remove the hardcoded list
construction.
- Line 80: The call to MSstatsPreprocess is inconsistent with other
fully-qualified calls; update the invocation of MSstatsPreprocess in the
converters_PhilosophertoMSstatsTMTFormat.R file to use the MSstatsConvert::
namespace (i.e., replace MSstatsPreprocess(...) with
MSstatsConvert::MSstatsPreprocess(...)) so it matches other calls like
MSstatsConvert::functionName used elsewhere in the file.
In `@R/converters_SpectroMinetoMSstatsTMTFormat.R`:
- Line 28: The parameter name in the SpectroMinetoMSstatsTMTFormat converter is
inconsistent: rename the function argument summaryforMultipleRows to
summaryforMultiplePSMs (or vice versa to match the other converter) and update
all internal references in the function body, default value
(rmProtein_with1Feature = FALSE, summaryforMultiplePSMs = sum), any
roxygen/documentation, examples, and tests to use the new name so the public API
matches OpenMStoMSstatsTMTFormat; ensure any calls inside this file that
referenced summaryforMultipleRows are updated to the new identifier to avoid
undefined symbol errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f7c828fa-6f7e-4fae-b40a-900bf48036e1
⛔ Files ignored due to path filters (5)
inst/tinytest/raw_data/MaxQuantTMT/mq_annotation.csvis excluded by!**/*.csvinst/tinytest/raw_data/PDTMT/pd_annotation.csvis excluded by!**/*.csvinst/tinytest/raw_data/Philosopher/MSstatsTMT_annotation.csvis excluded by!**/*.csvinst/tinytest/raw_data/Philosopher/msstats.csvis excluded by!**/*.csvinst/tinytest/raw_data/SpectroMine/spectromine_annotation.csvis excluded by!**/*.csv
📒 Files selected for processing (19)
DESCRIPTIONNAMESPACER/converters_MaxQtoMSstatsTMTFormat.RR/converters_OpenMStoMSstatsTMTFormat.RR/converters_PDtoMSstatsTMTFormat.RR/converters_PhilosophertoMSstatsTMTFormat.RR/converters_SpectroMinetoMSstatsTMTFormat.Rinst/tinytest/test_converters_MaxQtoMSstatsTMTFormat.Rinst/tinytest/test_converters_OpenMStoMSstatsTMTFormat.Rinst/tinytest/test_converters_PDtoMSstatsTMTFormat.Rinst/tinytest/test_converters_PhilosophertoMSstatsTMTFormat.Rinst/tinytest/test_converters_ProteinProspectortoMSstatsTMTFormat.Rinst/tinytest/test_converters_SpectroMinetoMSstatsTMTFormat.Rman/MaxQtoMSstatsTMTFormat.Rdman/OpenMStoMSstatsTMTFormat.Rdman/PDtoMSstatsTMTFormat.Rdman/PhilosophertoMSstatsTMTFormat.Rdman/SpectroMinetoMSstatsTMTFormat.Rdman/dot-getPhilosopherInput.Rd
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/converters_PhilosophertoMSstatsTMTFormat.R`:
- Around line 74-77: The oxidation_filter currently hardcodes col_name =
"PeptideSequence", so when peptide_id_col is "Peptide.Sequence" or oxidation
info is stored in the Modifications column the rmPeptide_OxidationM filter
silently does nothing; change oxidation_filter to reference the actual peptide
id/modification column (use the peptide_id_col parameter when present) or
default to "Modifications" instead of the literal "PeptideSequence", and ensure
downstream code that applies rmPeptide_OxidationM reads oxidation annotations
from that resolved column name (keep drop_column behavior unchanged).
- Around line 49-56: annotation may be a file path string, but the code extracts
channels with annotation[["Channel"]] before converting file-path annotations to
a table; fix by detecting when annotation is a scalar file path (e.g.,
is.character(annotation) && length(annotation)==1 && file.exists(annotation))
and read/parse it into a data.frame (so annotation becomes a table) before
calling channels = unique(annotation[["Channel"]]),
MSstatsConvert::MSstatsImport(... list(annotation = annotation) ...),
MSstatsConvert::MSstatsClean, and MSstatsMakeAnnotation; ensure the same parsed
annotation object is passed to these functions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 86402da0-682e-4d5e-9ac1-0cf7153edbe5
📒 Files selected for processing (24)
R/converters_MaxQtoMSstatsTMTFormat.RR/converters_OpenMStoMSstatsTMTFormat.RR/converters_PhilosophertoMSstatsTMTFormat.RR/utils_documentation.Rinst/tinytest/test_converters_PDtoMSstatsTMTFormat.Rman/DIANNtoMSstatsFormat.Rdman/DIAUmpiretoMSstatsFormat.Rdman/FragPipetoMSstatsFormat.Rdman/MaxQtoMSstatsFormat.Rdman/MaxQtoMSstatsTMTFormat.Rdman/MetamorpheusToMSstatsFormat.Rdman/OpenMStoMSstatsFormat.Rdman/OpenMStoMSstatsTMTFormat.Rdman/OpenSWATHtoMSstatsFormat.Rdman/PDtoMSstatsFormat.Rdman/PDtoMSstatsTMTFormat.Rdman/PhilosophertoMSstatsTMTFormat.Rdman/ProgenesistoMSstatsFormat.Rdman/ProteinProspectortoMSstatsTMTFormat.Rdman/SkylinetoMSstatsFormat.Rdman/SpectroMinetoMSstatsTMTFormat.Rdman/SpectronauttoMSstatsFormat.Rdman/dot-getPhilosopherInput.Rdman/dot-sharedParametersAmongConverters.Rd
✅ Files skipped from review due to trivial changes (20)
- man/SkylinetoMSstatsFormat.Rd
- man/OpenMStoMSstatsFormat.Rd
- man/SpectronauttoMSstatsFormat.Rd
- R/utils_documentation.R
- man/dot-sharedParametersAmongConverters.Rd
- man/DIAUmpiretoMSstatsFormat.Rd
- man/MaxQtoMSstatsFormat.Rd
- man/ProgenesistoMSstatsFormat.Rd
- man/FragPipetoMSstatsFormat.Rd
- man/MetamorpheusToMSstatsFormat.Rd
- man/ProteinProspectortoMSstatsTMTFormat.Rd
- man/PDtoMSstatsFormat.Rd
- man/dot-getPhilosopherInput.Rd
- man/MaxQtoMSstatsTMTFormat.Rd
- man/DIANNtoMSstatsFormat.Rd
- man/OpenSWATHtoMSstatsFormat.Rd
- man/PDtoMSstatsTMTFormat.Rd
- man/OpenMStoMSstatsTMTFormat.Rd
- man/SpectroMinetoMSstatsTMTFormat.Rd
- man/PhilosophertoMSstatsTMTFormat.Rd
🚧 Files skipped from review as they are similar to previous changes (3)
- inst/tinytest/test_converters_PDtoMSstatsTMTFormat.R
- R/converters_OpenMStoMSstatsTMTFormat.R
- R/converters_MaxQtoMSstatsTMTFormat.R
Motivation and Context
Looking to migrate TMT converters to MSstatsConvert. This will help with unit testing in general to verify that none of our TMT converters break from any changes in MSstatsConvert.
Changes
Testing
Added unit tests.
Checklist Before Requesting a Review