Skip to content

feat(plotting): Refactor QC plot headings to be compatible for protein turnover#194

Merged
tonywu1999 merged 6 commits intodevelfrom
feat-turnover-final
Apr 14, 2026
Merged

feat(plotting): Refactor QC plot headings to be compatible for protein turnover#194
tonywu1999 merged 6 commits intodevelfrom
feat-turnover-final

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented Apr 14, 2026

PR Type

Bug fix, Enhancement


Description

  • Preserve original turnover groups

  • Use is_labeled_ref in plots

  • Support Heavy/Light label fallback

  • Fix QC group axis handling


Diagram Walkthrough

flowchart LR
  A["Validated turnover input"]
  B["Original GROUP and SUBJECT kept"]
  C["Plot labels derived from metadata"]
  D["QC and turnover plots corrected"]
  A -- "preserve metadata" --> B
  B -- "drives plot grouping" --> D
  C -- "uses `is_labeled_ref` or fallback labels" --> D
Loading

File Walkthrough

Relevant files
Bug fix
dataProcessPlots.R
Improve turnover label and group plotting                               

R/dataProcessPlots.R

  • Prefer is_labeled_ref for LABEL mapping
  • Add fallback Heavy/Light label handling
  • Update single-label naming and colors
  • Keep all GROUP values in axes
+26/-23 
utils_checks.R
Retain original grouping in validation                                     

R/utils_checks.R

  • Preserve GROUP_ORIGINAL as GROUP
  • Preserve SUBJECT_ORIGINAL as SUBJECT
  • Remove hard-coded "0" turnover assignments
+2/-2     

Motivation and Context

This PR adjusts plotting functions to properly handle protein turnover experiments, which differ from SRM (Selected Reaction Monitoring) experiments in their use of isotope labels. In protein turnover studies, both heavy (H) and light (L) labeled channels represent independent biological measurements and are processed independently. The PR refactors data preparation and plotting logic to support this distinction through the introduction of an is_labeled_ref column that marks whether a row represents a labeled reference (used in SRM) versus endogenous labeled channels (used in turnover studies).

Detailed Changes

R/utils_checks.R - Data Column Preparation

  • Removed conditional GROUP/SUBJECT assignment: Previously, the function conditionally assigned GROUP and SUBJECT based on LABEL == "L", using a hardcoded sentinel value "0" for non-"L" labels
  • Simplified logic: Now directly assigns GROUP and SUBJECT from their original values (GROUP_ORIGINAL and SUBJECT_ORIGINAL respectively)
  • This eliminates the "0" sentinel value that was preventing proper group numbering in downstream processing

R/dataProcessPlots.R - Profile and QC Plotting Functions

  • Added is_labeled_ref column detection: Both .plotProfile and .plotQC functions now check for the presence of the is_labeled_ref column
    • If present: LABEL is remapped to "Reference" (for TRUE values) and "Endogenous" (for FALSE values) with appropriate color coding
    • If absent: Falls back to legacy label detection logic that maps labels based on their presence and count
  • Simplified tempGroupName construction: Removed conditional logic that filtered based on length(unique(processed$LABEL)) == 2 and GROUP == '0'; now always orders by RUN before deriving group levels and factoring GROUP
  • Applied consistent LABEL handling: Both plot functions use the same LABEL remapping logic for consistency

Unit Tests

Existing tests in inst/tinytest/test_dataProcess.R validate the is_labeled_ref column behavior:

  • Tests verify that when is_labeled_ref=TRUE is present, protein summarization uses the labeled reference path (via .runTukey with is_labeled_reference=TRUE)
  • Tests confirm that labeled reference rows (H channel in SRM) are not imputed despite being censored
  • Tests validate that labeled endogenous rows (L channel) receive proper imputation when censored

Coding Guidelines

No coding guideline violations detected. The changes maintain consistency with the existing codebase patterns for data manipulation using data.table, conditional column operations, and factor level assignments.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

The PR refactors label and group handling logic across two utility files. Changes simplify tempGroupName construction, replace conditional label remapping with a column-aware approach (checking for is_labeled_ref column), and remove conditional derivation of GROUP and SUBJECT from LABEL values in the column processing function.

Changes

Cohort / File(s) Summary
Label Remapping Logic
R/dataProcessPlots.R
Replaced conditional LABEL remapping with column-aware approach: if is_labeled_ref column exists, map to "Reference"/"Endogenous"; otherwise detect levels and map to "Heavy"/"Light" or "Endogenous"/"Heavy" based on detected labels. Simplified tempGroupName construction by removing conditional filtering. Applied changes to both .plotProfile() and .plotQC() functions.
Column Processing Updates
R/utils_checks.R
Modified .updateColumnsForProcessing() to directly assign GROUP and SUBJECT from their _ORIGINAL counterparts instead of deriving them conditionally from LABEL == "L", removing the sentinel "0" value injection for non-"L" labels.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Vitek-Lab/MSstats#189: Modifies isotope label normalization and heavy/light label mapping downstream of label assignment logic.
  • Vitek-Lab/MSstats#192: Directly aligned refactor introducing is_labeled_reference column handling and replacing LABEL-based conditionals throughout the codebase.

Suggested labels

Review effort 3/5

Poem

🐰 Labels were tangled, conditions were deep,
Now columns speak clearly, no secrets to keep,
A "Reference" here, "Endogenous" there,
The logic flows cleaner through processing air! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description uses a custom format instead of the required template, lacks detailed motivation/context, and leaves the 'Changes' section empty despite significant code modifications. Restructure the description following the template: add comprehensive 'Motivation and Context' section, replace the diagram with a detailed bullet-point 'Changes' list, add 'Testing' subsection describing unit tests, and complete the pre-review checklist.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title 'Refactor QC plot headings to be compatible for protein turnover' is partially related to the changeset. It specifically references QC plot headings and protein turnover compatibility, which align with some aspects of the changes (particularly in .plotQC modifications). However, the actual changes are broader—they refactor GROUP/SUBJECT derivation logic, LABEL remapping based on is_labeled_ref detection, and affect both profile and QC plotting functions. The title narrows focus to 'QC plot headings' when the changes span multiple plotting functions and core label handling logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat-turnover-final

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tonywu1999 tonywu1999 changed the title Feat turnover final feat(plotting): Rename QC plot and profile plot headings for protein turnover Apr 14, 2026
@github-actions github-actions Bot changed the title feat(plotting): Rename QC plot and profile plot headings for protein turnover Feat turnover final Apr 14, 2026
@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Missing labels

When is_labeled_ref is present but some rows are NA, this branch overwrites LABEL with NA instead of falling back to the existing label values. In partially annotated turnover inputs, those rows can drop out of the plot or lose their legend/color assignment.

if ("is_labeled_ref" %in% colnames(processed)) {
  processed[, LABEL := factor(
    ifelse(is_labeled_ref, "Reference", "Endogenous"),
    levels = c("Reference", "Endogenous")
  )]
Fallback relabel

The two-label fallback renames any pair of label values to Heavy/Light purely from factor level order. If an input already uses semantic labels such as Reference and Endogenous but does not include is_labeled_ref, the plot legend will be wrong and the series can even be swapped by alphabetical ordering.

label_levels = levels(factor(processed$LABEL))
if (length(label_levels) == 2) {
  processed[, LABEL := factor(LABEL, labels = c("Heavy", "Light"))]
} else if ("L" %in% label_levels) {

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard label remapping

length(label_levels) == 2 will relabel any two-valued input as Heavy/Light, which
silently mislabels older Reference/Endogenous datasets when is_labeled_ref is
absent. Branch on the actual raw LABEL values instead, and keep unknown schemes
unchanged rather than forcing them into turnover labels. Mirror the same fix in the
later LABEL normalization block as well.

R/dataProcessPlots.R [295-302]

-label_levels = levels(factor(processed$LABEL))
-if (length(label_levels) == 2) {
-  processed[, LABEL := factor(LABEL, labels = c("Heavy", "Light"))]
-} else if ("L" %in% label_levels) {
-  processed[, LABEL := factor(LABEL, labels = c("Endogenous"))]
+label_levels = sort(unique(stats::na.omit(as.character(processed$LABEL))))
+if (identical(label_levels, c("H", "L"))) {
+  processed[, LABEL := factor(as.character(LABEL),
+                              levels = c("H", "L"),
+                              labels = c("Heavy", "Light"))]
+} else if (identical(label_levels, "L")) {
+  processed[, LABEL := factor("Endogenous", levels = "Endogenous")]
+} else if (identical(label_levels, "H")) {
+  processed[, LABEL := factor("Heavy", levels = "Heavy")]
 } else {
-  processed[, LABEL := factor(LABEL, labels = c("Heavy"))]
+  processed[, LABEL := factor(as.character(LABEL), levels = label_levels)]
 }
Suggestion importance[1-10]: 7

__

Why: This correctly points out that length(label_levels) == 2 is too broad and can mislabel any two-valued LABEL scheme as Heavy/Light. It addresses a real plotting/legend correctness issue in dataProcessPlots, though its impact is limited to label interpretation rather than core computation.

Medium
Skip placeholder axis group

tempGroupName now includes every GROUP, so legacy labeled inputs that still carry
placeholder group 0 will produce a fake x-axis group and shift
groupAxis/lineNameAxis. Exclude that placeholder only when multiple labels are
present so the plot remains correct for both old and new data layouts.

R/dataProcessPlots.R [264-265]

 tempGroupName = unique(processed[, c("GROUP", "RUN")])
+if (data.table::uniqueN(stats::na.omit(processed$LABEL)) > 1 &&
+    "0" %in% as.character(tempGroupName$GROUP)) {
+  tempGroupName = tempGroupName[as.character(GROUP) != "0"]
+}
 tempGroupName = tempGroupName[order(RUN), ]
Suggestion importance[1-10]: 3

__

Why: The suggestion is plausible as a backward-compatibility safeguard because a placeholder GROUP value of 0 would distort groupAxis and lineNameAxis. However, this PR also changes R/utils_checks.R so GROUP now comes directly from GROUP_ORIGINAL, making the issue mostly defensive rather than clearly necessary in the new flow.

Low

@tonywu1999 tonywu1999 changed the title Feat turnover final feat(plotting): Rename QC plot headings for protein turnover Apr 14, 2026
@tonywu1999 tonywu1999 changed the title feat(plotting): Rename QC plot headings for protein turnover feat(plotting): Refactor QC plot headings to be compatible for protein turnover Apr 14, 2026
@tonywu1999 tonywu1999 merged commit f07eeeb into devel Apr 14, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the feat-turnover-final branch April 14, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant