Fix turnover#124
Conversation
📝 WalkthroughWalkthroughThe pull request modifies isotope label handling in Spectronaut to MSstats format conversion. When Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
inst/tinytest/test_converters_SpectronauttoMSstatsFormat.R (1)
439-443: Use exact peptide matching to reduce false positives in this test.
grepl(..., fixed = TRUE)can accidentally include longer sequences containing the same motif. Consider normalizing labels and matching exactly.♻️ Suggested test hardening
+strip_labels = function(x) gsub("\\[[^]]+\\]", "", x) + angyt_input_intensities = sort( boxcar_raw[PEP.StrippedSequence == "ANGYTTEYSASVK", FG.MS1Quantity]) angyt_output_intensities = sort( - output_heavy[grepl("ANGYTTEYSASVK", PeptideSequence, fixed = TRUE), Intensity]) + output_heavy[strip_labels(PeptideSequence) == "ANGYTTEYSASVK", Intensity]) expect_equivalent(angyt_input_intensities, angyt_output_intensities) @@ angyt_rows = subset(output_heavy, grepl("ANGYTTEYSASVK", PeptideSequence, fixed = TRUE)) +angyt_rows = output_heavy[strip_labels(PeptideSequence) == "ANGYTTEYSASVK"] expect_equal(nrow(angyt_rows), 2L * n_bioreplicates)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@inst/tinytest/test_converters_SpectronauttoMSstatsFormat.R` around lines 439 - 443, The test currently uses grepl("ANGYTTEYSASVK", PeptideSequence, fixed = TRUE) which can match longer peptides; change to exact matching after normalizing any label/modification suffixes on PeptideSequence: create angyt_rows by selecting rows where the normalized PeptideSequence equals "ANGYTTEYSASVK" (e.g. strip labels/mod chars then use == or %in% rather than grepl), keep references to boxcar_raw$R.Replicate, output_heavy, angyt_rows, PeptideSequence and then assert expect_equal(nrow(angyt_rows), 2L * n_bioreplicates).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@inst/tinytest/test_converters_SpectronauttoMSstatsFormat.R`:
- Around line 439-443: The test currently uses grepl("ANGYTTEYSASVK",
PeptideSequence, fixed = TRUE) which can match longer peptides; change to exact
matching after normalizing any label/modification suffixes on PeptideSequence:
create angyt_rows by selecting rows where the normalized PeptideSequence equals
"ANGYTTEYSASVK" (e.g. strip labels/mod chars then use == or %in% rather than
grepl), keep references to boxcar_raw$R.Replicate, output_heavy, angyt_rows,
PeptideSequence and then assert expect_equal(nrow(angyt_rows), 2L *
n_bioreplicates).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5d93dcf1-12fc-4e55-86ed-413ee825e12f
📒 Files selected for processing (2)
R/converters_SpectronauttoMSstatsFormat.Rinst/tinytest/test_converters_SpectronauttoMSstatsFormat.R
Motivation and Context
This PR fixes a critical issue in the Spectronaut converter's handling of isotope label types in protein turnover experiments. The problem occurs when the
IsotopeLabelTypecolumn is already present in the cleaned input data (which happens after the cleaning step processes heavy isotope labels). Previously, the code was unconditionally filling missingIsotopeLabelTypevalues with "L" based on theheavyLabelsparameter, but this logic failed to account for scenarios whereIsotopeLabelTypewas already populated by the cleaning function. This caused incorrect handling of feature balancing in the preprocessing step and improper row counts when light and heavy isotope variants needed to be distinguished.The fix ensures that when
IsotopeLabelTypeis already present, it is treated as a feature column for preprocessing and not overwritten with default values.Changes
R/converters_SpectronauttoMSstatsFormat.R
Added conditional logic for feature column handling: Introduced
preprocess_feature_columnsvariable that checks ifIsotopeLabelTypealready exists in the input. If present, it extendsfeature_columnsto includeIsotopeLabelType; otherwise, it uses the originalfeature_columnsunchanged.Updated isotope label filling logic: Modified
fill_isotope_label_typeto be conditionally empty whenIsotopeLabelTypeis already present in the input (no default fill), but retains the original fill behavior (setting to "L") when the column is missing.Changed MSstatsPreprocess call: Updated the call from passing
feature_columnsto passingpreprocess_feature_columns, ensuring the preprocessing step correctly handles isotope labels as a feature dimension when appropriate.Unit Tests
inst/tinytest/test_converters_SpectronauttoMSstatsFormat.R
Added two assertions to the "Heavy Label Testing" section:
Intensity preservation test: Extracts
FG.MS1Quantityvalues from the input (boxcar_raw) for the peptide "ANGYTTEYSASVK", retrieves correspondingIntensityvalues from the output (output_heavy) matching the same peptide sequence, and verifies they are equivalent usingexpect_equivalent.Row count validation test: Computes the number of unique bioreplicates from the input, filters the output for rows matching "ANGYTTEYSASVK", and verifies the resulting row count equals
2L * n_bioreplicates(accounting for one row each for heavy and light isotope labels per bioreplicate) usingexpect_equal.Both tests wrap
output_heavyas a data.table to enable row/column subsetting operations.Coding Guidelines
No violations of the project's coding guidelines are evident. The changes follow the existing code style and patterns: