Skip to content

fix(checks): support 'heavy'/'light' IsotopeLabelType explicitly#189

Merged
tonywu1999 merged 6 commits intodevelfrom
feat-turnover-1
Apr 12, 2026
Merged

fix(checks): support 'heavy'/'light' IsotopeLabelType explicitly#189
tonywu1999 merged 6 commits intodevelfrom
feat-turnover-1

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented Apr 12, 2026

PR Type

Bug fix, Tests


Description

  • Normalize heavy/light isotope labels

  • Preserve consistent H/L factor levels

  • Add preprocessing edge-case coverage

  • Verify SRM light/protein outputs


Diagram Walkthrough

flowchart LR
  a["Raw ISOTOPELABELTYPE values"]
  b["Map heavy/light to H/L"]
  c["Create normalized factor levels"]
  d["Validate preprocessing and SRM outputs"]
  a -- "normalize" --> b
  b -- "build" --> c
  c -- "covered by" --> d
Loading

File Walkthrough

Relevant files
Bug fix
utils_checks.R
Normalize isotope label aliases during preprocessing         

R/utils_checks.R

  • Add explicit heavy/light to H/L mapping
  • Rebuild ISOTOPELABELTYPE from normalized values
  • Restrict factor levels to present H/L states
+10/-6   
Tests
test_dataProcess.R
Strengthen SRM `dataProcess` output assertions                     

inst/tinytest/test_dataProcess.R

  • Assert FeatureLevelData contains L labels
  • Assert ProteinLevelData is non-empty
+8/-0     
test_utils_checks.R
Cover isotope label normalization edge cases                         

inst/tinytest/test_utils_checks.R

  • Add rename coverage for PEPTIDEMODIFIEDSEQUENCE
  • Test heavy/light normalization to H/L
  • Verify single and mixed label factor levels
  • Confirm unknown and NA labels become NA
+92/-0   

Motivation and Context

The MSstats package needed to support explicit "heavy"/"light" IsotopeLabelType encodings in addition to the canonical "H"/"L" values. Previously, the code did not normalize these alternative labels, causing inconsistent factor level handling. This fix introduces a normalization layer that maps these common labels to standard values while ensuring factor levels are consistently set based on the actual data present.

Solution Summary

The PR adds an isotope label mapping mechanism that:

  • Converts explicit "heavy" and "light" strings to canonical "H" and "L" values
  • Normalizes factor levels to reflect only the labels actually present in the data (instead of forcing levels unconditionally)
  • Handles edge cases including unknown labels (mapped to NA) and NA inputs (preserved as NA with appropriate factor levels)

Detailed Changes

R/utils_checks.R

  • Replaced label handling logic in .prepareForDataProcess():
    • Removed the previous conditional factor level assignment (if (uniqueN == 2) then c("H","L") else "L")
    • Introduced an explicit label_map dictionary mapping:
      • "H""H"
      • "L""L"
      • "heavy""H"
      • "light""L"
    • Applied mapping via factor(label_map[as.character(input$ISOTOPELABELTYPE)], levels = c("H", "L", "NA"))
  • Impact: Non-H/L encodings are now normalized; factor levels are restricted to mapped values intersected with canonical levels, eliminating the prior behavior of forcing all non-2-level cases to level "L"
  • Lines changed: +12/-10 (net +2)

inst/tinytest/test_utils_checks.R

  • New comprehensive test file covering .prepareForDataProcess() with multiple scenarios:
    • Column renaming: Verifies PEPTIDEMODIFIEDSEQUENCE is renamed to PEPTIDESEQUENCE and the original column is removed
    • Heavy/light mapping: Confirms "heavy" maps to factor level "H" and "light" maps to "L" with exact factor levels c("H","L")
    • Single-label cases: Tests all-"light" and all-"L" inputs produce single factor level "L"
    • Mixed labels: Verifies H/L string inputs preserve per-row labels with factor levels c("H","L")
    • Unknown labels: Confirms unmapped labels (e.g., "test") produce NA values
    • NA handling: Validates NA inputs map to NA, with factor levels c("H","L") preserved in mixed NA/L/H scenarios
  • Lines changed: +92/-0

inst/tinytest/test_dataProcess.R

  • Strengthened SRM output validation: Added assertions for QuantDataDefault (result of dataProcess(SRMRawData, ...))
    • Confirms FeatureLevelData$LABEL contains light-label ("L") rows
    • Verifies ProteinLevelData is non-empty (nrow(...) > 0)
  • Impact: Expands coverage to validate both heavy ("H") and light ("L") label preservation
  • Lines changed: +8/-0

Unit Tests

Test Coverage Summary

  • test_utils_checks.R (new file, 92 lines):

    • 1 test for PEPTIDEMODIFIEDSEQUENCE renaming
    • 7 tests for IsotopeLabelType normalization covering: heavy/light mapping, single labels, mixed H/L, unknown labels, NA propagation, and mixed NA/L/H scenarios
    • All tests validate both value-level behavior (including NA propagation) and factor level outcomes
  • test_dataProcess.R (2 additional assertions):

    • Assertion 1: FeatureLevelData$LABEL contains light-label rows
    • Assertion 2: ProteinLevelData non-empty validation

Coding Guidelines

No coding guideline violations detected. The implementation follows R conventions for factor creation and mapping, uses consistent naming conventions (. prefix for internal functions), and maintains consistency with existing codebase patterns.

@tonywu1999 tonywu1999 changed the title Feat turnover 1 fix(checks): support 'heavy'/'light' IsotopeLabelType explicitly Apr 12, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Warning

Rate limit exceeded

@tonywu1999 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 13 minutes and 1 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 13 minutes and 1 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 36b0dc31-6892-4365-a7ca-c31109cc48d1

📥 Commits

Reviewing files that changed from the base of the PR and between 13fadf0 and 7030c00.

📒 Files selected for processing (1)
  • R/utils_checks.R
📝 Walkthrough

Walkthrough

Introduced an internal helper .mapIsotopeLabelType() that maps various IsotopeLabelType encodings (e.g., "heavy"/"light") to canonical "H"/"L" and uses those mapped values as factor levels in .checkUnProcessedDataValidity() and .prepareForDataProcess(). Added tests covering many IsotopeLabelType scenarios.

Changes

Cohort / File(s) Summary
Label mapping helper
R/utils_checks.R
Added .mapIsotopeLabelType() and replaced prior factor()+binary-level fallback logic with an explicit mapping to canonical "H"/"L" and restricted factor levels.
Updated tests (existing)
inst/tinytest/test_dataProcess.R
Expanded assertions for dataProcess(...) outputs: check presence of "L" in FeatureLevelData$LABEL and that ProteinLevelData is non-empty.
New tests (coverage)
inst/tinytest/test_utils_checks.R
New test file exercising .prepareForDataProcess() with varied IsotopeLabelType inputs: "heavy""H", "light""L", mixed "H"/"L", unknown labels→NA, and NA propagation; asserts values and factor levels.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰
Hop-hop, I mapped the tags just right,
"heavy" to H and "light" to L in sight,
Tests nibble through cases, bold and spry,
Missing labels hide, mixed levels fly,
A little rabbit cheers — code clarified!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description deviates from the template structure, using a custom format with diagrams and file walkthrough tables instead of the required sections. Restructure the description to follow the template: add 'Motivation and Context' section, provide detailed bullet-point 'Changes' list, describe 'Testing' additions, and complete the checklist.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding explicit support for 'heavy'/'light' IsotopeLabelType values, which aligns with the core fix implemented in utils_checks.R.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat-turnover-1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Normalize label values first

Normalize ISOTOPELABELTYPE before the lookup. As written, common inputs like Heavy,
LIGHT, or values with surrounding spaces are converted to NA, which will silently
misclassify valid rows.

R/utils_checks.R [234-243]

+normalized_labels <- tolower(trimws(as.character(input$ISOTOPELABELTYPE)))
 label_map <- c(
-    "H"     = "H",
-    "L"     = "L",
+    "h"     = "H",
+    "l"     = "L",
     "heavy" = "H",
     "light" = "L"
 )
+mapped_labels <- unname(label_map[normalized_labels])
 input$ISOTOPELABELTYPE <- factor(
-    label_map[as.character(input$ISOTOPELABELTYPE)],
-    levels = intersect(c("H", "L"), label_map[as.character(input$ISOTOPELABELTYPE)])
+    mapped_labels,
+    levels = intersect(c("H", "L"), mapped_labels)
 )
Suggestion importance[1-10]: 6

__

Why: This is a valid robustness improvement because the current label_map only handles exact H/L/heavy/light values, so case or whitespace variants would become NA. It could prevent mislabeling in .prepareForDataProcess, but the PR itself does not show those variants are currently expected.

Low

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/utils_checks.R`:
- Around line 234-243: The label normalization change must be applied to the
raw-input validation path too: update .checkUnProcessedDataValidity to use the
same label_map and factor creation as in .prepareForDataProcess (use
label_map[as.character(input$ISOTOPELABELTYPE)] and set ISOTOPELABELTYPE <-
factor(..., levels = intersect(c("H","L"),
label_map[as.character(input$ISOTOPELABELTYPE)]))) instead of the old
single-level fallback to "L", so single-level values like "heavy" map to "H"
consistently across both preprocessing paths.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b799e890-dfc9-41e8-ab17-13ff6a9c5ecd

📥 Commits

Reviewing files that changed from the base of the PR and between 0cd7d27 and f4c4189.

📒 Files selected for processing (3)
  • R/utils_checks.R
  • inst/tinytest/test_dataProcess.R
  • inst/tinytest/test_utils_checks.R

Comment thread R/utils_checks.R Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
R/utils_checks.R (1)

219-227: ⚠️ Potential issue | 🟠 Major

Normalize ISOTOPELABELTYPE before enforcing the “max 2 levels” check.

The validation currently runs on raw values, so alias mixes like "H", "heavy", and "L" can be incorrectly rejected as “more than two levels” before canonicalization.

Proposed fix
-    if (data.table::uniqueN(input$ISOTOPELABELTYPE) > 2) {
+    input$ISOTOPELABELTYPE <- .mapIsotopeLabelType(input$ISOTOPELABELTYPE)
+    normalized_labels <- stats::na.omit(as.character(input$ISOTOPELABELTYPE))
+    if (data.table::uniqueN(normalized_labels) > 2) {
         getOption("MSstatsLog")(
           "ERROR",  paste(
             "There are more than two levels of labeling.",
             "So far, only label-free or reference-labeled experiment are supported. - stop"))
         stop("Statistical tools in MSstats are only proper for label-free or with reference peptide experiments.")
     }
-    
-    input$ISOTOPELABELTYPE <- .mapIsotopeLabelType(input$ISOTOPELABELTYPE)
     input
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/utils_checks.R` around lines 219 - 227, Normalize ISOTOPELABELTYPE before
counting unique levels: call .mapIsotopeLabelType on input$ISOTOPELABELTYPE
first (so aliases like "H"/"heavy"/"L" map to canonical values), then perform
data.table::uniqueN(...) and the error/stop logic; update the check that
currently uses data.table::uniqueN(input$ISOTOPELABELTYPE) to use the mapped
value and keep the existing processLogger/error message and stop call intact.
🧹 Nitpick comments (1)
R/utils_checks.R (1)

150-157: Consider case-insensitive, trimmed label mapping for better alias handling.

Current mapping is exact-match only; variants like "Heavy", " light ", or "h" become NA. Making this normalization tolerant would reduce avoidable missing labels.

Proposed refactor
 .mapIsotopeLabelType = function(x) {
-    label_map <- c(
-        "H"     = "H",
-        "L"     = "L",
-        "heavy" = "H",
-        "light" = "L"
-    )
-    mapped <- label_map[as.character(x)]
+    label_map <- c(
+        "h"     = "H",
+        "l"     = "L",
+        "heavy" = "H",
+        "light" = "L"
+    )
+    key <- tolower(trimws(as.character(x)))
+    mapped <- unname(label_map[key])
     factor(mapped, levels = intersect(c("H", "L"), mapped))
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/utils_checks.R` around lines 150 - 157, Normalize and trim input before
lookup: create a normalized version of x (e.g., lowercased and
whitespace-trimmed) and map common aliases like "h", "l", "heavy", "light" to
canonical "H"/"L" via the existing label_map; replace the current lookup (mapped
<- label_map[as.character(x)]) with one that uses the normalized values, then
produce the factor with levels = intersect(c("H","L"), mapped) as before
(referencing label_map, mapped, x, and the factor call).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@R/utils_checks.R`:
- Around line 219-227: Normalize ISOTOPELABELTYPE before counting unique levels:
call .mapIsotopeLabelType on input$ISOTOPELABELTYPE first (so aliases like
"H"/"heavy"/"L" map to canonical values), then perform data.table::uniqueN(...)
and the error/stop logic; update the check that currently uses
data.table::uniqueN(input$ISOTOPELABELTYPE) to use the mapped value and keep the
existing processLogger/error message and stop call intact.

---

Nitpick comments:
In `@R/utils_checks.R`:
- Around line 150-157: Normalize and trim input before lookup: create a
normalized version of x (e.g., lowercased and whitespace-trimmed) and map common
aliases like "h", "l", "heavy", "light" to canonical "H"/"L" via the existing
label_map; replace the current lookup (mapped <- label_map[as.character(x)])
with one that uses the normalized values, then produce the factor with levels =
intersect(c("H","L"), mapped) as before (referencing label_map, mapped, x, and
the factor call).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 99a28fb7-f0a4-4f41-8392-d57a313d28f6

📥 Commits

Reviewing files that changed from the base of the PR and between f4c4189 and 13fadf0.

📒 Files selected for processing (1)
  • R/utils_checks.R

@tonywu1999 tonywu1999 merged commit d9418ed into devel Apr 12, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the feat-turnover-1 branch April 12, 2026 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant