feat: Introduce normalization with unlabeled peptides#192
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 25 minutes and 52 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughThis PR replaces several LABEL-level cardinality checks with explicit checks for a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
inst/tinytest/test_pr3_ref_flag.R (1)
71-121: Add one labeled-without-refregression case.These tests exercise the new
refpath, but not the labeled paths whererefis intentionally absent (QUANTILE,GLOBALSTANDARDS,FALSE). A single case asserting that H rows are still excluded from censoring and that labeled TMP/linear summarization still works there would catch the current regression.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@inst/tinytest/test_pr3_ref_flag.R` around lines 71 - 121, The test suite only exercises code paths where the input contains a ref column; add a regression test that uses a labeled dataset without a ref column (e.g., for normalization methods "QUANTILE", "GLOBALSTANDARDS" or when ref is FALSE) to ensure MSstatsHandleMissing still treats LABEL=="H" rows as not censored and that summary_method="TMP"/linear summarization on labeled data runs without error; create a small data.table similar to make_cens_input but omit the ref column, call MSstatsHandleMissing(...) with the same parameters used in the existing test, and add assertions mirroring the existing expect_false(any(out_cens[LABEL=="H", censored])) and the labeled TMP summarization expectations to catch the regression.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/dataProcess.R`:
- Around line 560-562: The current is_labeled_reference boolean only checks for
a "ref" column and flips to FALSE when ref is absent, causing labeled
QUANTILE/GLOBALSTANDARDS/FALSE data to be misrouted to the unlabeled branch;
update the logic that sets is_labeled_reference (used when calling
.runTukey(single_protein, ...)) so it also detects labeled experiments even if
"ref" is missing — e.g. additionally check for signature labeled columns or
metadata (presence of H/L channel columns, a "label" or "channel" factor, or any
other project-specific labeled indicator in single_protein) and set
is_labeled_reference to TRUE when those are present, ensuring .runTukey follows
the labeled path and triggers .adjustLRuns() for H/L adjustment.
In `@R/utils_censored.R`:
- Around line 132-138: The nonmissing_filter used by .prepareTMP() /
.prepareLinear() currently treats any non-NA H-row as observed, which allows
peptides with only reference-channel signal to increment n_obs / n_obs_run;
update the nonmissing_filter expression (the block that sets nonmissing_filter
based on censored_symbol and input$newABUNDANCE) to also exclude reference rows
by adding a predicate that filters out reference channels/rows (e.g., the
dataset's reference indicator such as IS_REF / IS_REFERENCE / REF_CHANNEL flag)
so only endogenous (non-reference) signals count toward n_obs and n_obs_run;
ensure the same change is applied in all branches where nonmissing_filter is
assigned.
- Around line 34-35: Change the construction of use_for_analysis so that when
the ref column is present it stays as !input$ref, but when ref is missing it
falls back to checking the light-channel via the LABEL column (i.e. use
input$LABEL == "L") instead of defaulting to rep(TRUE, nrow(input)); update the
expression that sets use_for_analysis (currently using colnames(input) and
input$ref) to branch: if ("ref" %in% colnames(input)) use !input$ref else if
("LABEL" %in% colnames(input)) use input$LABEL == "L" (and only default to
rep(TRUE, nrow(input)) as a last resort).
In `@R/utils_summarization_prepare.R`:
- Around line 38-41: The current guard skips creating ref_covariate when the
input lacks a ref column, causing labeled inputs (as determined by LABEL) to
miss the covariate and break MSstatsSummarizeSingleLinear; change the logic so
ref_covariate is always created for labeled runs (use LABEL to detect
labeledness) and, when a ref column exists, set ref_covariate by keying off !ref
rather than hard-coding LABEL == "L" (i.e., create input[, ref_covariate :=
factor(ifelse(if ( "ref" %in% colnames(input)) !ref else LABEL == "L", RUN, 0))]
or equivalent using MSstatsSummarizeSingleLinear, add_ref_covariate,
ref_covariate, LABEL, ref, and RUN to locate the code).
---
Nitpick comments:
In `@inst/tinytest/test_pr3_ref_flag.R`:
- Around line 71-121: The test suite only exercises code paths where the input
contains a ref column; add a regression test that uses a labeled dataset without
a ref column (e.g., for normalization methods "QUANTILE", "GLOBALSTANDARDS" or
when ref is FALSE) to ensure MSstatsHandleMissing still treats LABEL=="H" rows
as not censored and that summary_method="TMP"/linear summarization on labeled
data runs without error; create a small data.table similar to make_cens_input
but omit the ref column, call MSstatsHandleMissing(...) with the same parameters
used in the existing test, and add assertions mirroring the existing
expect_false(any(out_cens[LABEL=="H", censored])) and the labeled TMP
summarization expectations to catch the regression.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e42e8110-6030-450c-b094-709336b766dc
📒 Files selected for processing (6)
R/dataProcess.RR/utils_censored.RR/utils_feature_selection.RR/utils_normalize.RR/utils_summarization_prepare.Rinst/tinytest/test_pr3_ref_flag.R
ae47ab6 to
1815288
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/utils_normalize.R`:
- Around line 28-37: When normalization_mode is not EQUALIZEMEDIANS (including
the early returns for "NONE" or "FALSE" and other modes like "QUANTILE" or
"GLOBALSTANDARDS"), ensure any existing ref column is removed so downstream
logic doesn't treat stale rows as reference; likewise, in the EQUALIZEMEDIANS
branch where you call .normalizeMedian(input), only add/set the ref column when
"H" exists in input$LABEL and explicitly remove/refuse to set ref when LABEL
lacks "H". Update the logic around normalization_method, the early return path,
and the EQUALIZEMEDIANS path to drop the ref column (if present) unless you are
intentionally creating it in the EQUALIZEMEDIANS + "H" case.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 573b0df6-0bac-4778-af01-52426ade316f
📒 Files selected for processing (6)
R/dataProcess.RR/utils_censored.RR/utils_normalize.RR/utils_summarization_prepare.Rinst/tinytest/test_utils_censored.Rinst/tinytest/test_utils_normalize.R
✅ Files skipped from review due to trivial changes (1)
- R/utils_summarization_prepare.R
🚧 Files skipped from review as they are similar to previous changes (1)
- R/dataProcess.R
Motivation and Solution
This PR introduces explicit handling of unlabeled peptides in normalization workflows and replaces label-cardinality-based logic with an explicit boolean reference flag (
ref/is_labeled_reference) derived during median (EQUALIZEMEDIANS) normalization. Therefflag is propagated through normalization, censoring, and summarization flows so that unlabeled standards and labeled reference rows can be treated consistently (e.g., excluded from endogenous summarization and censoring calculations). The change enables GLOBALSTANDARDS normalization using unlabeled peptides, makes censoring/summarization operate only on endogenous (non-reference) rows, and simplifies label-guarding by depending onrefpresence/content rather than LABEL counting.Detailed Changes
R/dataProcess.R
refcolumn with any non-missing TRUE values instead of using LABEL cardinality (nlevels > 1).is_labeled_reference) into .runTukey() and remove prior is_labeled computation.R/utils_normalize.R
refcolumn (TRUE for heavy-label rows).R/utils_censored.R
R/utils_summarization_prepare.R
refcolumn exists and has at least one non-NA TRUE value ("ref" %in% colnames(input) && any(input$ref, na.rm = TRUE)); when created, keep existing factor computation (ifelse(LABEL == "L", RUN, 0)).R/utils_feature_selection.R
Robustness and small fixes
Unit Tests Added or Modified
inst/tinytest/test_pr3_ref_flag.R (new)
refcreation during median normalization (EQUALIZEMEDIANS).ref.inst/tinytest/test_utils_censored.R (extended)
inst/tinytest/test_utils_normalize.R (extended)
refcolumn for labeled data (TRUE for LABEL == "H", FALSE for LABEL == "L"); ensuresrefnot added for label-free input or for QUANTILE normalization.inst/tinytest/test_utils_summarization_prepare.R (modified input)
refcolumn with alternating TRUE/FALSE to exercise ref_covariate creation; label-free test remains unchanged.Coding Guidelines (violations or noteworthy deviations)
ref) and corresponding gating replaces ad-hoc LABEL cardinality checks, improving clarity.If desired, reviewers may call out style/consistency concerns (naming conventions for
refvsis_labeled_reference, or the placement of EQUALIZEMEDIANS as a standalone if) during code review, but no clear guideline breaches are present in the provided summaries.PR Type
Enhancement, Bug fix, Tests
Description
Add
ref-based reference channel detectionNormalize unlabeled standards across all labels
Restrict censoring and summarization by
refAdd regression tests for
refbehaviorDiagram Walkthrough
File Walkthrough
dataProcess.R
Use `ref` to drive Tukey flowR/dataProcess.R
refis_labeled_referenceinto.runTukeyutils_normalize.R
Extend normalization with `ref` and unlabeled standardsR/utils_normalize.R
refafterEQUALIZEMEDIANSforHstandards = "unlabeled"discoveryutils_censored.R
Make censoring respect reference flagsR/utils_censored.R
use_for_analysisfromrefNAcensoring similarlyutils_summarization_prepare.R
Prepare summarization using `ref` semanticsR/utils_summarization_prepare.R
ref_covariateonly whenrefexistsrefdetectiongetProcessedrobust toNAremovesutils_feature_selection.R
Minor formatting cleanup in feature selectionR/utils_feature_selection.R
test_pr3_ref_flag.R
Add regression tests for `ref` handlinginst/tinytest/test_pr3_ref_flag.R
refcreation during median normalizationrefref=TRUErowsgetProcessedhandling ofNA