Skip to content

feat: Introduce normalization with unlabeled peptides#192

Merged
tonywu1999 merged 8 commits intodevelfrom
feat-turnover-3
Apr 13, 2026
Merged

feat: Introduce normalization with unlabeled peptides#192
tonywu1999 merged 8 commits intodevelfrom
feat-turnover-3

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented Apr 13, 2026

Motivation and Solution

This PR introduces explicit handling of unlabeled peptides in normalization workflows and replaces label-cardinality-based logic with an explicit boolean reference flag (ref / is_labeled_reference) derived during median (EQUALIZEMEDIANS) normalization. The ref flag is propagated through normalization, censoring, and summarization flows so that unlabeled standards and labeled reference rows can be treated consistently (e.g., excluded from endogenous summarization and censoring calculations). The change enables GLOBALSTANDARDS normalization using unlabeled peptides, makes censoring/summarization operate only on endogenous (non-reference) rows, and simplifies label-guarding by depending on ref presence/content rather than LABEL counting.

Detailed Changes

  • R/dataProcess.R

    • MSstatsSummarizeSingleTMP: determine labeled/reference presence by checking for a ref column with any non-missing TRUE values instead of using LABEL cardinality (nlevels > 1).
    • Pass renamed boolean (is_labeled_reference) into .runTukey() and remove prior is_labeled computation.
    • Lines changed: +3/-2.
  • R/utils_normalize.R

    • MSstatsNormalize: run EQUALIZEMEDIANS via a standalone conditional after early returns; when heavy labels exist, set a logical ref column (TRUE for heavy-label rows).
    • .normalizeGlobalStandards:
      • Support standards = "unlabeled": expand to PeptideSequence values from rows whose merged LABEL is NA; if none found, stop with an error.
      • Remove GROUP != "0" constraint when selecting standards_data.
      • Use mean_by_run (temporary) and apply ABUNDANCE := ABUNDANCE - mean_by_run + median_by_fraction across all labels (previous label-conditional adjustment removed).
      • Simplify logging to a single INFO message.
    • Lines changed: +26/-17.
  • R/utils_censored.R

    • MSstatsHandleMissing:
      • Introduce use_for_analysis boolean mask derived from ref when present (otherwise all rows).
      • Constrain quantile cutoff computation, censored assignment, cutoff_lower derivation, zero/<=0 handling (missing_symbol == "0"), and INTENSITY-missing handling (missing_symbol == "NA") to use use_for_analysis.
      • Remove LABEL == "L" gates from these branches and delete post-processing that forced censored = FALSE for LABEL == "H".
    • .getNonMissingFilter: remove LABEL == "L" requirement; depend on use_for_analysis and non-NA newABUNDANCE (with optional non-zero constraint when censored_symbol == "0"); apply use_for_analysis in both impute and non-impute paths.
    • Lines changed: +23/-24.
  • R/utils_summarization_prepare.R

    • MSstatsPrepareForSummarization: generate ref_covariate only if ref column exists and has at least one non-NA TRUE value ("ref" %in% colnames(input) && any(input$ref, na.rm = TRUE)); when created, keep existing factor computation (ifelse(LABEL == "L", RUN, 0)).
    • Lines changed: +3/-3.
  • R/utils_feature_selection.R

    • Minor whitespace/formatting cleanup; no functional changes.
  • Robustness and small fixes

    • Add handling for NA/removal cases in getProcessed paths to avoid failures when ref-driven rows are omitted.

Unit Tests Added or Modified

  • inst/tinytest/test_pr3_ref_flag.R (new)

    • Regression/tinytests verifying:
      • ref creation during median normalization (EQUALIZEMEDIANS).
      • Label-free paths do not create ref.
      • Censoring skips rows with ref == TRUE (reference rows excluded from censoring decisions).
      • getProcessed NA/removal robustness behavior.
  • inst/tinytest/test_utils_censored.R (extended)

    • New test section with helper make_cens_input() constructing paired LABEL=="L"/LABEL=="H" rows parameterized by INTENSITY, ABUNDANCE, and ref.
    • Asserts expected censored outcomes:
      • All LABEL=="H" rows have censored == FALSE.
      • LABEL=="L" rows with INTENSITY == 1 have censored == TRUE.
      • LABEL=="L" rows with INTENSITY > 1 have censored == FALSE.
  • inst/tinytest/test_utils_normalize.R (extended)

    • Tests for .normalizeGlobalStandards(..., "unlabeled"):
      • Uniformly intense unlabeled standard leaves ABUNDANCE unchanged.
      • Labeled peptides are excluded from unlabeled reference (post-normalization labeled peptide ABUNDANCE differs).
      • Unlabeled standard detection shifts run-specific ABUNDANCE for non-standard peptides across runs.
      • Error raised when no unlabeled peptides are found ("no unlabeled peptides found").
    • Tests that EQUALIZEMEDIANS creates logical ref column for labeled data (TRUE for LABEL == "H", FALSE for LABEL == "L"); ensures ref not added for label-free input or for QUANTILE normalization.
  • inst/tinytest/test_utils_summarization_prepare.R (modified input)

    • Labeled test input now includes ref column with alternating TRUE/FALSE to exercise ref_covariate creation; label-free test remains unchanged.

Coding Guidelines (violations or noteworthy deviations)

  • No explicit coding guideline violations are documented in the provided changes. Changes follow existing codebase patterns:
    • Added and exercised tests for new behavior.
    • Propagation of a new boolean column (ref) and corresponding gating replaces ad-hoc LABEL cardinality checks, improving clarity.
    • Error handling added for missing unlabeled standards.
    • Minor formatting cleanup was applied.

If desired, reviewers may call out style/consistency concerns (naming conventions for ref vs is_labeled_reference, or the placement of EQUALIZEMEDIANS as a standalone if) during code review, but no clear guideline breaches are present in the provided summaries.


PR Type

Enhancement, Bug fix, Tests


Description

  • Add ref-based reference channel detection

  • Normalize unlabeled standards across all labels

  • Restrict censoring and summarization by ref

  • Add regression tests for ref behavior


Diagram Walkthrough

flowchart LR
  n1["EQUALIZEMEDIANS adds ref flag"]
  n2["GLOBALSTANDARDS supports unlabeled peptides"]
  n3["Censoring analyzes only non-reference rows"]
  n4["Summarization uses ref-aware covariates"]
  n5["Tinytests validate ref behavior"]
  n1 -- "enables" --> n3
  n1 -- "drives" --> n4
  n2 -- "extends normalization" --> n4
  n3 -- "verified by" --> n5
  n4 -- "verified by" --> n5
Loading

File Walkthrough

Relevant files
Enhancement
dataProcess.R
Use `ref` to drive Tukey flow                                                       

R/dataProcess.R

  • Detect reference-labeled data from ref
  • Pass is_labeled_reference into .runTukey
+3/-2     
utils_normalize.R
Extend normalization with `ref` and unlabeled standards   

R/utils_normalize.R

  • Add ref after EQUALIZEMEDIANS for H
  • Support standards = "unlabeled" discovery
  • Error when unlabeled standards are missing
  • Normalize all labels with global standards
+26/-17 
Bug fix
utils_censored.R
Make censoring respect reference flags                                     

R/utils_censored.R

  • Derive use_for_analysis from ref
  • Apply censoring only to non-reference rows
  • Scope zero and NA censoring similarly
  • Remove label-specific nonmissing filtering
+22/-24 
utils_summarization_prepare.R
Prepare summarization using `ref` semantics                           

R/utils_summarization_prepare.R

  • Add ref_covariate only when ref exists
  • Replace label-count checks with ref detection
  • Make getProcessed robust to NA removes
+7/-7     
Formatting
utils_feature_selection.R
Minor formatting cleanup in feature selection                       

R/utils_feature_selection.R

  • Apply whitespace-only cleanup in feature averaging code
  • Preserve existing feature selection behavior
+1/-1     
Tests
test_pr3_ref_flag.R
Add regression tests for `ref` handling                                   

inst/tinytest/test_pr3_ref_flag.R

  • Test ref creation during median normalization
  • Verify label-free paths do not add ref
  • Ensure censoring skips ref=TRUE rows
  • Cover getProcessed handling of NA
+152/-0 

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 13, 2026

Warning

Rate limit exceeded

@tonywu1999 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 25 minutes and 52 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 25 minutes and 52 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a9cada15-95a8-430a-8d1f-e143898aaf97

📥 Commits

Reviewing files that changed from the base of the PR and between 0abd342 and 62f85a1.

📒 Files selected for processing (8)
  • R/dataProcess.R
  • R/utils_censored.R
  • R/utils_normalize.R
  • R/utils_summarization_prepare.R
  • inst/tinytest/test_dataProcess.R
  • inst/tinytest/test_utils_censored.R
  • inst/tinytest/test_utils_normalize.R
  • inst/tinytest/test_utils_summarization_prepare.R
📝 Walkthrough

Walkthrough

This PR replaces several LABEL-level cardinality checks with explicit checks for a ref column (presence and non-missing values) across data processing, normalization, censoring, and summarization-prep logic, and adds/updates unit tests exercising the new ref-based behavior.

Changes

Cohort / File(s) Summary
Data Processing
R/dataProcess.R
Switches labeled/unlabeled detection in MSstatsSummarizeSingleTMP from nlevels(single_protein$LABEL) > 1 to checking for a non-missing "ref" column; renames boolean argument passed into .runTukey(...) to is_labeled_reference.
Utils — Censoring
R/utils_censored.R
Adds use_for_analysis mask (based on ref when present) and applies it to censored/missing computations and assignments; removes prior LABEL == "L" gating and the post-processing line that forced censored = FALSE for LABEL == "H".
Utils — Normalization
R/utils_normalize.R
EQUALIZEMEDIANS now may add a logical ref column (TRUE for LABEL=="H"); .normalizeGlobalStandards uses mean_by_run, expands "unlabeled" standards to NA-labeled peptides, removes GROUP != "0" filtering, and applies ABUNDANCE adjustment to all rows.
Utils — Summarization Prep
R/utils_summarization_prepare.R
ref_covariate creation is now gated on "ref" %in% colnames(input) && any(input$ref, na.rm = TRUE) instead of uniqueN(input$LABEL) == 2.
Tests — Censoring
inst/tinytest/test_utils_censored.R
Adds tests and helper make_cens_input() to validate MSstatsHandleMissing censoring behavior under impute=TRUE, missing_symbol="0", and summary_method="TMP".
Tests — Normalization
inst/tinytest/test_utils_normalize.R
Extends tests for .normalizeGlobalStandards(..., "unlabeled") and for MSstatsNormalize method="EQUALIZEMEDIANS" to assert ref column creation/typing and unlabeled-standard behaviors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

Review effort 2/5

Poem

A rabbit peeks at rows and refs,
Swapping level counts for clearer clefs,
I hop through tests with joyful cheer,
Marking ref where labels appear—
Code burrows tidy, soft, and deft 🐇✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description includes motivation through a PR Type section, detailed changes via bullet points and file-level walkthroughs, and references to testing; however, it does not follow the template structure with explicit Motivation/Context, Changes, Testing, and Checklist sections. Reorganize the description to follow the template structure with explicit 'Motivation and Context', 'Changes', 'Testing', and 'Checklist' sections for clarity and consistency.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and specifically describes the main feature addition—normalization now supports unlabeled peptides as standards.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat-turnover-3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Missing `ref`

ref is only populated in the EQUALIZEMEDIANS branch, but the downstream labeled-data checks now rely on ref instead of detecting multiple LABEL levels. A heavy/light dataset using NONE, QUANTILE, or GLOBALSTANDARDS normalization will therefore be treated as unlabeled later in the pipeline, so the reference-channel-specific summarization path is skipped.

if (normalization_method == "EQUALIZEMEDIANS") {
    input = .normalizeMedian(input)
    if ("H" %in% input$LABEL) {
        input[, ref := LABEL == "H"]
    }
Heavy rows imputed

When ref is absent, MSstatsHandleMissing now falls back to analyzing every row. In labeled inputs that did not go through the EQUALIZEMEDIANS branch, heavy/reference rows can be marked as censored and imputed when INTENSITY == 1, ABUNDANCE <= 0, or values fall below the cutoff. Previously those rows were explicitly excluded from censoring.

use_for_analysis = if ("ref" %in% colnames(input)) !input$ref else rep(TRUE, nrow(input))
## if intensity = 1, but abundance > cutoff after normalization, it also should be censored.
if (!is.null(censored_cutoff)) {
    quantiles = input[use_for_analysis & !is.na(INTENSITY) & INTENSITY > 1,
                      quantile(ABUNDANCE,
                               prob = c(0.01, 0.25, 0.5, 0.75,
                                        censored_cutoff),
                               na.rm = TRUE)]
    iqr = quantiles[4] - quantiles[2]
    multiplier = (quantiles[5] - quantiles[4]) / iqr
    cutoff_lower = (quantiles[2] - multiplier * iqr)
    input$censored = use_for_analysis & !is.na(input$INTENSITY) &
        input$ABUNDANCE < cutoff_lower
    if (cutoff_lower <= 0 & !is.null(missing_symbol) & missing_symbol == "0") {
        zero_one_filter = use_for_analysis & !is.na(input$ABUNDANCE) & input$ABUNDANCE <= 0
        input$censored = ifelse(zero_one_filter, TRUE, input$censored)
    }
    if (!is.null(missing_symbol) & missing_symbol == "NA") {
        input$censored = ifelse(use_for_analysis & is.na(input$INTENSITY), TRUE,
Reference shifted

GLOBALSTANDARDS now subtracts the run offset from all rows, including the heavy/reference channel. In heavy/light workflows this changes the reference intensities that later steps use as the baseline, so the light-to-reference relationship is no longer preserved. The previous implementation only adjusted the endogenous channel.

input = merge(input, means_by_standard, all.x = TRUE, by = c("RUN", "FRACTION"))
input[, ABUNDANCE := ABUNDANCE - mean_by_run + median_by_fraction]

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Exclude reference rows again

This now treats reference-channel rows as non-missing observations, so labeled
datasets can accidentally summarize heavy/reference measurements together with
endogenous ones. Keep the new ref path, but exclude rows where ref is TRUE and fall
back to the old label-based behavior when ref is absent.

R/utils_censored.R [131-134]

-if (censored_symbol == "0") {
-    nonmissing_filter = !is.na(input$newABUNDANCE) & input$newABUNDANCE != 0
-} else if (censored_symbol == "NA") {
-    nonmissing_filter = !is.na(input$newABUNDANCE)
+analysis_filter = if ("ref" %in% colnames(input)) {
+    !ifelse(is.na(input$ref), FALSE, input$ref)
+} else {
+    input$LABEL == "L"
 }
 
+if (censored_symbol == "0") {
+    nonmissing_filter = analysis_filter & !is.na(input$newABUNDANCE) & input$newABUNDANCE != 0
+} else if (censored_symbol == "NA") {
+    nonmissing_filter = analysis_filter & !is.na(input$newABUNDANCE)
+}
+
Suggestion importance[1-10]: 8

__

Why: This correctly identifies that .getNonMissingFilter no longer excludes ref/heavy rows, unlike the prior LABEL == "L" behavior. Including reference measurements in the non-missing set can materially distort downstream summarization, so restoring a ref-aware fallback is important.

Medium
Skip reference-channel normalization

This applies the global-standards shift to every row, including the reference
channel in labeled experiments, which changes values that were previously meant to
stay fixed. Normalize only analysis rows and leave ref/heavy rows untouched, while
still normalizing all rows for unlabeled datasets.

R/utils_normalize.R [245-246]

 input = merge(input, means_by_standard, all.x = TRUE, by = c("RUN", "FRACTION"))
-input[, ABUNDANCE := ABUNDANCE - mean_by_run + median_by_fraction]
+normalize_filter = if ("ref" %in% colnames(input)) {
+    !ifelse(is.na(input$ref), FALSE, input$ref)
+} else {
+    is.na(input$LABEL) | input$LABEL != "H"
+}
+input[normalize_filter, ABUNDANCE := ABUNDANCE - mean_by_run + median_by_fraction]
Suggestion importance[1-10]: 8

__

Why: This correctly points out that .normalizeGlobalStandards now modifies all rows, whereas the old code left heavy/reference rows unchanged. Applying the shift to ref rows can change reference-channel values and alter labeled-analysis behavior, so limiting normalization to analysis rows is a meaningful fix.

Medium
Restore labeled fallback behavior

Labeled experiments normalized with methods other than EQUALIZEMEDIANS will no
longer create ref_covariate, because they still have LABEL but no ref column.
Preserve the previous label-based detection as a fallback so labeled summarization
does not silently lose the reference-channel adjustment.

R/utils_summarization_prepare.R [38-41]

-add_ref_covariate = "ref" %in% colnames(input) && any(input$ref, na.rm = TRUE)
+add_ref_covariate = if ("ref" %in% colnames(input)) {
+    any(input$ref, na.rm = TRUE)
+} else {
+    data.table::uniqueN(input$LABEL) == 2
+}
 if (add_ref_covariate) {
-    input[, ref_covariate := factor(ifelse(LABEL == "L", RUN, 0))]
+    input[, ref_covariate := factor(ifelse(
+        if ("ref" %in% colnames(input)) !ifelse(is.na(ref), FALSE, ref) else LABEL == "L",
+        RUN,
+        0
+    ))]
 }
Suggestion importance[1-10]: 7

__

Why: This is a valid regression catch: ref_covariate is now created only when ref exists, so labeled data processed without that column can silently lose the adjustment. The suggested fallback to LABEL preserves previous behavior for labeled experiments and improves compatibility.

Medium

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
inst/tinytest/test_pr3_ref_flag.R (1)

71-121: Add one labeled-without-ref regression case.

These tests exercise the new ref path, but not the labeled paths where ref is intentionally absent (QUANTILE, GLOBALSTANDARDS, FALSE). A single case asserting that H rows are still excluded from censoring and that labeled TMP/linear summarization still works there would catch the current regression.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@inst/tinytest/test_pr3_ref_flag.R` around lines 71 - 121, The test suite only
exercises code paths where the input contains a ref column; add a regression
test that uses a labeled dataset without a ref column (e.g., for normalization
methods "QUANTILE", "GLOBALSTANDARDS" or when ref is FALSE) to ensure
MSstatsHandleMissing still treats LABEL=="H" rows as not censored and that
summary_method="TMP"/linear summarization on labeled data runs without error;
create a small data.table similar to make_cens_input but omit the ref column,
call MSstatsHandleMissing(...) with the same parameters used in the existing
test, and add assertions mirroring the existing
expect_false(any(out_cens[LABEL=="H", censored])) and the labeled TMP
summarization expectations to catch the regression.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/dataProcess.R`:
- Around line 560-562: The current is_labeled_reference boolean only checks for
a "ref" column and flips to FALSE when ref is absent, causing labeled
QUANTILE/GLOBALSTANDARDS/FALSE data to be misrouted to the unlabeled branch;
update the logic that sets is_labeled_reference (used when calling
.runTukey(single_protein, ...)) so it also detects labeled experiments even if
"ref" is missing — e.g. additionally check for signature labeled columns or
metadata (presence of H/L channel columns, a "label" or "channel" factor, or any
other project-specific labeled indicator in single_protein) and set
is_labeled_reference to TRUE when those are present, ensuring .runTukey follows
the labeled path and triggers .adjustLRuns() for H/L adjustment.

In `@R/utils_censored.R`:
- Around line 132-138: The nonmissing_filter used by .prepareTMP() /
.prepareLinear() currently treats any non-NA H-row as observed, which allows
peptides with only reference-channel signal to increment n_obs / n_obs_run;
update the nonmissing_filter expression (the block that sets nonmissing_filter
based on censored_symbol and input$newABUNDANCE) to also exclude reference rows
by adding a predicate that filters out reference channels/rows (e.g., the
dataset's reference indicator such as IS_REF / IS_REFERENCE / REF_CHANNEL flag)
so only endogenous (non-reference) signals count toward n_obs and n_obs_run;
ensure the same change is applied in all branches where nonmissing_filter is
assigned.
- Around line 34-35: Change the construction of use_for_analysis so that when
the ref column is present it stays as !input$ref, but when ref is missing it
falls back to checking the light-channel via the LABEL column (i.e. use
input$LABEL == "L") instead of defaulting to rep(TRUE, nrow(input)); update the
expression that sets use_for_analysis (currently using colnames(input) and
input$ref) to branch: if ("ref" %in% colnames(input)) use !input$ref else if
("LABEL" %in% colnames(input)) use input$LABEL == "L" (and only default to
rep(TRUE, nrow(input)) as a last resort).

In `@R/utils_summarization_prepare.R`:
- Around line 38-41: The current guard skips creating ref_covariate when the
input lacks a ref column, causing labeled inputs (as determined by LABEL) to
miss the covariate and break MSstatsSummarizeSingleLinear; change the logic so
ref_covariate is always created for labeled runs (use LABEL to detect
labeledness) and, when a ref column exists, set ref_covariate by keying off !ref
rather than hard-coding LABEL == "L" (i.e., create input[, ref_covariate :=
factor(ifelse(if ( "ref" %in% colnames(input)) !ref else LABEL == "L", RUN, 0))]
or equivalent using MSstatsSummarizeSingleLinear, add_ref_covariate,
ref_covariate, LABEL, ref, and RUN to locate the code).

---

Nitpick comments:
In `@inst/tinytest/test_pr3_ref_flag.R`:
- Around line 71-121: The test suite only exercises code paths where the input
contains a ref column; add a regression test that uses a labeled dataset without
a ref column (e.g., for normalization methods "QUANTILE", "GLOBALSTANDARDS" or
when ref is FALSE) to ensure MSstatsHandleMissing still treats LABEL=="H" rows
as not censored and that summary_method="TMP"/linear summarization on labeled
data runs without error; create a small data.table similar to make_cens_input
but omit the ref column, call MSstatsHandleMissing(...) with the same parameters
used in the existing test, and add assertions mirroring the existing
expect_false(any(out_cens[LABEL=="H", censored])) and the labeled TMP
summarization expectations to catch the regression.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e42e8110-6030-450c-b094-709336b766dc

📥 Commits

Reviewing files that changed from the base of the PR and between e0ee7f9 and ae47ab6.

📒 Files selected for processing (6)
  • R/dataProcess.R
  • R/utils_censored.R
  • R/utils_feature_selection.R
  • R/utils_normalize.R
  • R/utils_summarization_prepare.R
  • inst/tinytest/test_pr3_ref_flag.R

Comment thread R/dataProcess.R Outdated
Comment thread R/utils_censored.R Outdated
Comment thread R/utils_censored.R Outdated
Comment thread R/utils_summarization_prepare.R Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/utils_normalize.R`:
- Around line 28-37: When normalization_mode is not EQUALIZEMEDIANS (including
the early returns for "NONE" or "FALSE" and other modes like "QUANTILE" or
"GLOBALSTANDARDS"), ensure any existing ref column is removed so downstream
logic doesn't treat stale rows as reference; likewise, in the EQUALIZEMEDIANS
branch where you call .normalizeMedian(input), only add/set the ref column when
"H" exists in input$LABEL and explicitly remove/refuse to set ref when LABEL
lacks "H". Update the logic around normalization_method, the early return path,
and the EQUALIZEMEDIANS path to drop the ref column (if present) unless you are
intentionally creating it in the EQUALIZEMEDIANS + "H" case.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 573b0df6-0bac-4778-af01-52426ade316f

📥 Commits

Reviewing files that changed from the base of the PR and between ae47ab6 and 1815288.

📒 Files selected for processing (6)
  • R/dataProcess.R
  • R/utils_censored.R
  • R/utils_normalize.R
  • R/utils_summarization_prepare.R
  • inst/tinytest/test_utils_censored.R
  • inst/tinytest/test_utils_normalize.R
✅ Files skipped from review due to trivial changes (1)
  • R/utils_summarization_prepare.R
🚧 Files skipped from review as they are similar to previous changes (1)
  • R/dataProcess.R

Comment thread R/utils_normalize.R
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant