Skip to content

Fix turnover#124

Merged
tonywu1999 merged 2 commits intodevelfrom
fix-turnover
Apr 7, 2026
Merged

Fix turnover#124
tonywu1999 merged 2 commits intodevelfrom
fix-turnover

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented Apr 7, 2026

Motivation and Context

This PR fixes a critical issue in the Spectronaut converter's handling of isotope label types in protein turnover experiments. The problem occurs when the IsotopeLabelType column is already present in the cleaned input data (which happens after the cleaning step processes heavy isotope labels). Previously, the code was unconditionally filling missing IsotopeLabelType values with "L" based on the heavyLabels parameter, but this logic failed to account for scenarios where IsotopeLabelType was already populated by the cleaning function. This caused incorrect handling of feature balancing in the preprocessing step and improper row counts when light and heavy isotope variants needed to be distinguished.

The fix ensures that when IsotopeLabelType is already present, it is treated as a feature column for preprocessing and not overwritten with default values.

Changes

R/converters_SpectronauttoMSstatsFormat.R

  • Added conditional logic for feature column handling: Introduced preprocess_feature_columns variable that checks if IsotopeLabelType already exists in the input. If present, it extends feature_columns to include IsotopeLabelType; otherwise, it uses the original feature_columns unchanged.

  • Updated isotope label filling logic: Modified fill_isotope_label_type to be conditionally empty when IsotopeLabelType is already present in the input (no default fill), but retains the original fill behavior (setting to "L") when the column is missing.

  • Changed MSstatsPreprocess call: Updated the call from passing feature_columns to passing preprocess_feature_columns, ensuring the preprocessing step correctly handles isotope labels as a feature dimension when appropriate.

Unit Tests

inst/tinytest/test_converters_SpectronauttoMSstatsFormat.R

Added two assertions to the "Heavy Label Testing" section:

  1. Intensity preservation test: Extracts FG.MS1Quantity values from the input (boxcar_raw) for the peptide "ANGYTTEYSASVK", retrieves corresponding Intensity values from the output (output_heavy) matching the same peptide sequence, and verifies they are equivalent using expect_equivalent.

  2. Row count validation test: Computes the number of unique bioreplicates from the input, filters the output for rows matching "ANGYTTEYSASVK", and verifies the resulting row count equals 2L * n_bioreplicates (accounting for one row each for heavy and light isotope labels per bioreplicate) using expect_equal.

Both tests wrap output_heavy as a data.table to enable row/column subsetting operations.

Coding Guidelines

No violations of the project's coding guidelines are evident. The changes follow the existing code style and patterns:

  • Uses conditional statements appropriately for logical branching
  • Variable naming follows the underscore-separated lowercase convention used elsewhere in the codebase
  • The modifications maintain consistency with existing feature column handling patterns
  • Test assertions follow the project's tinytest conventions

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

The pull request modifies isotope label handling in Spectronaut to MSstats format conversion. When IsotopeLabelType exists in input data, it is now included in preprocessing; otherwise, it defaults to "L". The MSstatsPreprocess call now uses the conditionally-extended feature columns list instead of the original one.

Changes

Cohort / File(s) Summary
Converter Logic
R/converters_SpectronauttoMSstatsFormat.R
Conditional handling of IsotopeLabelType: extends preprocess_feature_columns to include it when present, and adjusts fill behavior to avoid default fill when column exists while filling with "L" when absent.
Test Assertions
inst/tinytest/test_converters_SpectronauttoMSstatsFormat.R
Added two new assertions in "Heavy Label Testing" section to validate FG.MS1Quantity values match Intensity for specific peptides and verify row counts equal 2L * unique_bioreplicates.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • devonjkohler

Poem

🐰 Whiskers twitching with delight,
Isotopes labeled, heavy and light,
When labels dance in conversion's embrace,
The data finds its rightful place,
MSstats now knows the peptide's true way!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is entirely empty with no motivation, context, detailed changes, or testing information provided. Add a complete description following the template: include motivation/context and solution summary, detailed bullet points of changes, description of unit tests added, and check the pre-review checklist items.
Title check ❓ Inconclusive The title 'Fix turnover' is vague and generic, using non-descriptive language that doesn't clearly convey what aspect of turnover was fixed or the specific change made. Provide a more specific title that describes the actual fix, such as 'Fix isotope label handling in Spectronaut conversion' or 'Fix turnover calculation for heavy isotope labeling'.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-turnover

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
inst/tinytest/test_converters_SpectronauttoMSstatsFormat.R (1)

439-443: Use exact peptide matching to reduce false positives in this test.

grepl(..., fixed = TRUE) can accidentally include longer sequences containing the same motif. Consider normalizing labels and matching exactly.

♻️ Suggested test hardening
+strip_labels = function(x) gsub("\\[[^]]+\\]", "", x)
+
 angyt_input_intensities = sort(
     boxcar_raw[PEP.StrippedSequence == "ANGYTTEYSASVK", FG.MS1Quantity])
 angyt_output_intensities = sort(
-    output_heavy[grepl("ANGYTTEYSASVK", PeptideSequence, fixed = TRUE), Intensity])
+    output_heavy[strip_labels(PeptideSequence) == "ANGYTTEYSASVK", Intensity])
 expect_equivalent(angyt_input_intensities, angyt_output_intensities)
@@
 angyt_rows = subset(output_heavy, grepl("ANGYTTEYSASVK", PeptideSequence, fixed = TRUE))
+angyt_rows = output_heavy[strip_labels(PeptideSequence) == "ANGYTTEYSASVK"]
 expect_equal(nrow(angyt_rows), 2L * n_bioreplicates)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@inst/tinytest/test_converters_SpectronauttoMSstatsFormat.R` around lines 439
- 443, The test currently uses grepl("ANGYTTEYSASVK", PeptideSequence, fixed =
TRUE) which can match longer peptides; change to exact matching after
normalizing any label/modification suffixes on PeptideSequence: create
angyt_rows by selecting rows where the normalized PeptideSequence equals
"ANGYTTEYSASVK" (e.g. strip labels/mod chars then use == or %in% rather than
grepl), keep references to boxcar_raw$R.Replicate, output_heavy, angyt_rows,
PeptideSequence and then assert expect_equal(nrow(angyt_rows), 2L *
n_bioreplicates).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@inst/tinytest/test_converters_SpectronauttoMSstatsFormat.R`:
- Around line 439-443: The test currently uses grepl("ANGYTTEYSASVK",
PeptideSequence, fixed = TRUE) which can match longer peptides; change to exact
matching after normalizing any label/modification suffixes on PeptideSequence:
create angyt_rows by selecting rows where the normalized PeptideSequence equals
"ANGYTTEYSASVK" (e.g. strip labels/mod chars then use == or %in% rather than
grepl), keep references to boxcar_raw$R.Replicate, output_heavy, angyt_rows,
PeptideSequence and then assert expect_equal(nrow(angyt_rows), 2L *
n_bioreplicates).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5d93dcf1-12fc-4e55-86ed-413ee825e12f

📥 Commits

Reviewing files that changed from the base of the PR and between e4b78fe and bacc483.

📒 Files selected for processing (2)
  • R/converters_SpectronauttoMSstatsFormat.R
  • inst/tinytest/test_converters_SpectronauttoMSstatsFormat.R

@tonywu1999 tonywu1999 merged commit a780511 into devel Apr 7, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the fix-turnover branch April 7, 2026 19:39
@coderabbitai coderabbitai Bot mentioned this pull request Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant