feat(protein turnover): per-label statistics and multi-label summarization by tonywu1999 · Pull Request #193 · Vitek-Lab/MSstats

tonywu1999 · 2026-04-13T17:27:12Z

PR Type

Bug fix, Enhancement, Tests, Documentation

Description

Summarize turnover data per PROTEIN/LABEL
Preserve LABEL in Tukey outputs
Fix per-label counting and missingness stats
Add regression tests and docs

Diagram Walkthrough

flowchart LR
  A["Labeled protein input"]
  B["Split by PROTEIN + LABEL"]
  C["Per-label linear/TMP summaries"]
  D["LABEL-aware output metrics"]
  E["Tests and documentation"]

  A -- "non-reference data" --> B
  B -- "drives" --> C
  C -- "propagates `LABEL` to" --> D
  D -- "validated by" --> E

File Walkthrough

Relevant files

Enhancement

2 files

dataProcess.R `Split summaries by label and keep labels`	+40/-39
utils_summarization.R `Return multi-label Tukey summaries consistently`	+20/-19

Bug fix

2 files

utils_output.R `Make output merging and stats label-aware`	+26/-29
utils_summarization_prepare.R `Group observation counts within each label`	+13/-13

Tests

1 files

test_pr4_per_label.R `Add tests for per-label summarization behavior`	+167/-0

Documentation

2 files

dot-fitTukey.Rd Document new `is_labeled_reference` Tukey parameter	+7/-1
dot-runTukey.Rd `Update Tukey docs for multi-label behavior`	+6/-2

Motivation and context / Short solution summary

Turnover (multi-label, e.g., H/L) summarization previously relied on single-label assumptions (many code paths filtered LABEL == "L"), causing incorrect aggregation, imputation, and outputs for turnover experiments. This PR implements per-label summarization and makes label-aware fixes across summarization, outputs, tests, and documentation. Behavior is now controlled by a clarified flag is_labeled_reference: TRUE = SRM-style (H treated as normalization reference, return L-normalized output), FALSE = turnover-style (summarize each LABEL independently). LABEL is preserved through Linear/TMP/Tukey pipelines, counts/missingness are computed per PROTEIN+LABEL, and output merging is made robust to mismatched columns.

Detailed changes (by file / area)

R/dataProcess.R
- Protein-splitting now uses PROTEIN+LABEL unless is_labeled_ref indicates SRM (then split by PROTEIN only).
- AFT imputation fit_data uses rows with is_labeled_ref == FALSE when present (instead of LABEL == "L").
- predicted and newABUNDANCE assignments gate on is_labeled_ref == FALSE when available (else on censored).
- survival table column selection tightened with intersect(...) and LABEL included only when present.
- Single-feature and multi-feature Linear/TMP results now include LABEL (set to "L" when is_labeled_reference TRUE; otherwise preserved from data).
- .runTukey called with is_labeled_reference where applicable.
R/utils_output.R
- Use data.table::rbindlist(..., fill = TRUE) when combining list outputs (summarized, predicted_survival).
- Protein-level merges and group counts become label-aware: TotalGroupMeasurements grouped by PROTEIN, GROUP, LABEL; merge keys include LABEL.
- lab derivation no longer filters on LABEL == "L"; GROUP handling preserved appropriately.
- Added is_labeled_ref to retained feature-level columns.
- NumMeasuredFeature and NumImputedFeature aggregated by PROTEIN, RUN, LABEL.
- nonmissing_orig redefined to depend on censoring/INTENSITY rather than LABEL gating.
R/utils_summarization.R
- Renamed parameter is_labeled → is_labeled_reference in .runTukey/.fitTukey and updated semantics.
- Multi-feature path calls .fitTukey(input, is_labeled_reference).
- Single-feature path:
  - is_labeled_reference = TRUE: apply .adjustLRuns(...) and return L-normalized output (LABEL forced to "L").
  - is_labeled_reference = FALSE: return results for all labels (keep LABEL in outputs).
- .getNonMissingFilterStats simplified: nonmissing derived from !is.na(newABUNDANCE) & !censored (or !is.na(INTENSITY) when newABUNDANCE absent); removed LABEL== "L" special-casing.
R/utils_summarization_prepare.R / MSstatsPrepareForSummarization
- Detects add_ref_covariate from presence of is_labeled_ref and passes is_labeled_reference into .prepareSummary.
- .prepareSummary signature changed to accept is_labeled_reference; label_by = character(0) when is_labeled_reference TRUE, else "LABEL".
- Grouping keys for counts/missingness made label-aware when is_labeled_reference FALSE:
  - n_obs: by PROTEIN, FEATURE, LABEL
  - n_obs_run: by PROTEIN, RUN, LABEL
  - total_features: by PROTEIN, LABEL
  - prop_features: by PROTEIN, RUN, LABEL
- Nonmissing filtering applied per PROTEIN/FEATURE/(LABEL) depending on is_labeled_reference.
- Consolidated preparation helpers (removed separate .prepareTMP/.prepareLinear helpers in this diff).
R/utils_imputation.R / R/utils_censored.R / others
- Imputation and censored handling updated to use is_labeled_ref where appropriate (e.g., survive fitting uses rows where is_labeled_ref == FALSE).
Tests
- Added/updated tinytests and unit tests across inst/tinytest and tests:
  - inst/tinytest/test_utils_summarization.R: covers .fitTukey/.runTukey behavior for both is_labeled_reference = FALSE and TRUE; asserts presence of LABEL/LogIntensities and SRM normalization behavior.
  - inst/tinytest/test_utils_summarization_prepare.R: adds make_two_label_input and verifies n_obs and total_features computed per PROTEIN+FEATURE+LABEL (when is_labeled_reference FALSE) and expected behavior when TRUE.
  - inst/tinytest/test_dataProcess.R: updated regression checks to expect LABEL and added SRM imputation assertions (censored H rows should keep predicted = NA; censored L rows can be imputed).
  - inst/tinytest/test_utils_censored.R, test_utils_imputation.R, and normalization tests updated/covered for is_labeled_ref usage.
  - New tests file tests/test_pr4_per_label.R added for per-label summarization regression/unit coverage.
Documentation (man/*.Rd)
- man/dot-fitTukey.Rd, man/dot-runTukey.Rd, man/dot-prepareSummary.Rd updated to document the new is_labeled_reference parameter and SRM vs turnover semantics.
- man/dot-prepareLinear.Rd and man/dot-prepareTMP.Rd removed (internal doc changes reflecting helper consolidation).
- Some HTML docs and other man pages updated where applicable.
Miscellaneous
- Made output merging label-aware and resilient to column mismatches (rbindlist fill).
- Removed guards/assumptions that filtered to LABEL == "L" so non-reference labels are handled correctly.
- Internal function signatures changed; exported/public API surface unchanged.

Unit tests added or modified (summary)

inst/tinytest/test_utils_summarization.R
- Tests for .fitTukey and .runTukey with is_labeled_reference = FALSE (turnover) and TRUE (SRM).
- Tests assert LABEL presence (when turnover) and L-only outputs (when SRM), plus LogIntensities/newABUNDANCE presence.
- Tests for .getNonMissingFilterStats nonmissing selection across labels.
inst/tinytest/test_utils_summarization_prepare.R
- make_two_label_input fixture and assertions that:
  - n_obs counted per PROTEIN+FEATURE+LABEL when is_labeled_reference = FALSE.
  - total_features counted per PROTEIN+LABEL.
  - is_labeled_reference = TRUE yields H rows sharing counts with L rows (no zeroed counts).
inst/tinytest/test_dataProcess.R
- Regression checks updated to allow column differences via intersection-of-columns and fsetequal checks.
- New SRM imputation assertions verifying predicted NA behavior for H (is_labeled_ref=TRUE) vs imputed L (is_labeled_ref=FALSE).
Additional tinytests
- Updates across imputation, censored, normalization tests to validate is_labeled_ref handling.

Coding guidelines / issues observed

Inconsistent naming and documentation state:
- R code and man pages use is_labeled_reference, but several C++ sources and generated Rcpp exports still use is_labeled / is_reference (src/linear_summary.cpp, src/RcppExports.cpp, R/RcppExports.R, docs HTML). This creates potential mismatch between R-level parameter naming/semantics and C++ interfaces and documentation artifacts (docs reference is_labeled in some HTML/man files). Recommend aligning names and updating C++/Rcpp interfaces and generated docs to avoid confusion.
Remaining doc inconsistencies:
- Some documentation/html files still show older parameter names (is_labeled) or usage signatures (docs/reference/dot-runTukey.html, docs/reference/dot-fitLinearModel.html). Ensure documentation rebuild to reflect new parameter names/semantics.
Removal of man pages:
- man/dot-prepareLinear.Rd and man/dot-prepareTMP.Rd were removed; ensure this was intentional and that internal helpers are sufficiently documented elsewhere if needed.
No changes to exported/public R interfaces were made in this diff, but internal signature changes and C++ naming differences may warrant explicit changelog notes.

- dataProcess.R: split protein_indices by PROTEIN+LABEL (not just PROTEIN) when not using labeled reference, so each label is summarized separately; remove LABEL == "L" filters from Linear/TMP survival imputation and result aggregation; propagate LABEL column through all result tables - utils_summarization.R: rename is_labeled → is_labeled_reference in .runTukey/.fitTukey; return LABEL in non-reference results; remove LABEL == "L" guard from .getNonMissingFilterStats - utils_output.R: use rbindlist(fill=TRUE) for mixed-schema result lists; add LABEL to TotalGroupMeasurements/NumMeasuredFeature/NumImputedFeature grouping keys; merge summarized+lab on LABEL; include ref in output cols; remove LABEL == "L" guards from nonmissing tracking - man/: update .fitTukey and .runTukey Rd docs for renamed parameter Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

n_obs, n_obs_run, total_features, and prop_features must all be computed within each PROTEIN+LABEL combination so that H and L features are counted independently — a fixup for the per-label statistics commit. Also switch .fitTukey roxygen to @inheritParams .runTukey. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Tests for PR 4 covering: - .prepareLinear n_obs grouped by PROTEIN+FEATURE+LABEL (not pooled) - .runTukey(is_labeled_reference=FALSE) returns LABEL column for both H and L - .fitTukey(is_labeled_reference=FALSE) returns LABEL column - .getNonMissingFilterStats applies to all rows (no LABEL=="L" guard) - Regression: SRMRawData still summarizes correctly after per-label changes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-04-13T17:27:21Z

Warning

Rate limit exceeded

@tonywu1999 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 33 minutes and 47 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 33 minutes and 47 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 651cd7ea-9bf0-4e26-ae47-40761752063e

📥 Commits

Reviewing files that changed from the base of the PR and between b9e28eb and eca8c92.

📒 Files selected for processing (1)

inst/tinytest/test_dataProcess.R

📝 Walkthrough

Walkthrough

Label-reference handling was added: an is_labeled_ref / is_labeled_reference flag now controls whether grouping and summarization exclude LABEL; Tukey fitting, non-missing filtering, survival/linear imputation, and output aggregation were made conditional on that flag. Tests and docs updated to reflect the new behavior.

Changes

Cohort / File(s)	Summary
Summarization control & preparation `R/utils_summarization_prepare.R`, `R/dataProcess.R`	Introduce and propagate `is_labeled_reference` / `is_labeled_ref`; `.prepareSummary` signature changed to accept it; grouping keys and nonmissing logic become label-aware or label-agnostic depending on the flag; TMP/Linear preparation consolidated into `.prepareSummary`.
Tukey & non-missing logic `R/utils_summarization.R`, `man/dot-fitTukey.Rd`, `man/dot-runTukey.Rd`	Rename parameter to `is_labeled_reference`; `.fitTukey()` signature updated and branching added to return only L when reference-mode is TRUE and return all labels otherwise; `.runTukey()` updated accordingly; `.getNonMissingFilterStats()` simplified to rely on `newABUNDANCE`/`censored` or `INTENSITY`.
Model fitting / imputation `R/dataProcess.R`	Survival/linear/TMP imputation now selects fitting rows based on `is_labeled_ref` when present (falling back to prior LABEL-based logic); `predicted`/`newABUNDANCE` assignment gates on `is_labeled_ref` where available; survival columns chosen via `intersect(...)` of actual cols.
Output aggregation & binding `R/utils_output.R`	Use `data.table::rbindlist(..., fill=TRUE)` for heterogeneous result lists; include `is_labeled_ref` in feature output columns; make protein/run-level metrics and merges label-aware (grouping/joins include `LABEL`).
Tests `inst/tinytest/test_utils_summarization.R`, `inst/tinytest/test_utils_summarization_prepare.R`, `inst/tinytest/test_dataProcess.R`	Add/extend tests for `.fitTukey(..., is_labeled_reference=FALSE/TRUE)`, `.runTukey`, `.getNonMissingFilterStats`, label-aware `.prepareSummary` behavior, and SRM/TMP imputation expectations; adjust dataProcess regression comparisons to compare intersecting columns.
Documentation `man/dot-fitTukey.Rd`, `man/dot-runTukey.Rd`, `man/dot-prepareSummary.Rd`, `man/dot-prepareLinear.Rd` (removed), `man/dot-prepareTMP.Rd` (removed)	Document new `is_labeled_reference` parameter and semantics; update signatures for `.fitTukey` and `.runTukey`; remove outdated `.prepareLinear`/`.prepareTMP` Rd pages and update `.prepareSummary` doc to include `is_labeled_reference`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

feat: Introduce normalization with unlabeled peptides #192: Similar propagation/rename of the labeled-reference flag and parallel changes to summarization, imputation, and Tukey flows.
refactor(summarization): Rename ref regression covariate to ref_covariate for SRM experiments #190: Overlapping edits touching summarization/imputation paths and reference-flag usage in model fitting.
Feature model weights #174: Modifies the same summarization codepaths (MSstatsSummarizeSingleLinear/MSstatsSummarizeSingleTMP) and may conflict or interact with these changes.

Suggested labels

Review effort 3/5

Suggested reviewers

mstaniak

Poem

🐇 I hop through rows both L and H,

A tiny flag decides their way.
I bind with fills and group by care,
Impute the missing, tidy the pair.
Hops and tests—release hooray!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main objective: implementing per-label statistics and multi-label summarization for protein turnover data analysis.
Description check	✅ Passed	The PR description provides clear motivation, a diagram walkthrough, file-level details with specific change summaries, but lacks explicit testing section and incomplete motivation context.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat-turnover-4

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-13T17:29:27Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Wrong averaging In the single-feature linear path, the new summary now averages all rows in a run instead of only the light channel. For labeled-reference/SRM data, this function still receives both `H` and `L` rows together, so the returned protein abundance becomes the mean of the reference and measured channels rather than the light-channel summary. Any single-feature labeled-reference protein will therefore get a shifted `LogIntensities` value. result = single_protein[, .(LogIntensities = mean(newABUNDANCE)), by = RUN] result[, Protein := unique(single_protein$PROTEIN)] result[, LABEL := unique(single_protein$LABEL)] result[, Variance := NA_real_] Label assignment `LABEL` is populated with `unique(single_protein$LABEL)` even when the input group contains both `H` and `L` rows. In labeled-reference data that split is still done only by `PROTEIN`, so this assignment is not length-1. With more than two runs it can raise a data.table assignment error, and with exactly two runs it silently assigns alternating labels to run-level summaries. The downstream merge by `LABEL` will then attach incorrect or missing metadata. result = unique(single_protein[, .(Protein = PROTEIN, RUN = RUN)]) extracted_values = get_linear_summary(single_protein, cf, counts, label, cov_mat) result = cbind(result, extracted_values) result[, LABEL := unique(single_protein$LABEL)] Over-imputation The survival fit and censoring replacement now run on all labels, but labeled-reference workflows previously limited this to the light channel. When a heavy/reference row is censored, this change imputes the heavy value as well, which can alter the reference-based normalization and change the summarized light-channel abundance. This affects any labeled-reference dataset with censored heavy observations. survival_fit = .fitSurvival( single_protein[, cols, with = FALSE], aft_iterations ) sigma2 = survival_fit$scale^2 single_protein[, c("predicted", "imputation_var") := { pred = predict(survival_fit, newdata = .SD, se.fit = TRUE) list(pred$fit, pred$se.fit^2 + sigma2) }] single_protein[, predicted := ifelse(censored, predicted, NA)] single_protein[, newABUNDANCE := ifelse(censored, predicted, newABUNDANCE)] survival = single_protein[, intersect(c(cols, "LABEL", "predicted"), colnames(single_protein)), with = FALSE]

github-actions · 2026-04-13T17:35:05Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Avoid mixing label channels This now averages `H` and `L` together whenever `single_protein` contains both labels, which changes the summarized abundance for labeled-reference workflows. Summarize a label-homogeneous subset instead, and carry the corresponding single `LABEL` into the result so downstream merges stay correct. R/dataProcess.R [437-441] -result = single_protein[, .(LogIntensities = mean(newABUNDANCE)), by = RUN] -result[, Protein := unique(single_protein$PROTEIN)] -result[, LABEL := unique(single_protein$LABEL)] +summary_input = if (data.table::uniqueN(single_protein$LABEL) > 1L) { + single_protein[LABEL == "L"] +} else { + single_protein +} +result = summary_input[, .(LogIntensities = mean(newABUNDANCE)), by = RUN] +result[, Protein := unique(summary_input$PROTEIN)] +result[, LABEL := unique(summary_input$LABEL)] result[, Variance := NA_real_] -setcolorder(result, c("Protein", "RUN", "LogIntensities", "Variance")) +setcolorder(result, c("Protein", "RUN", "LABEL", "LogIntensities", "Variance")) Suggestion importance[1-10]: 8 __ Why: This correctly identifies a regression in the single-feature linear path: when `single_protein` contains both `H` and `L`, averaging all `newABUNDANCE` values by `RUN` changes the labeled-reference summary. Restricting the summary to the light-channel subset in the multi-label case preserves the prior behavior and keeps downstream `LABEL` handling consistent.	Medium
Possible issue	Prevent recycled label assignment `unique(single_protein$LABEL)` can return both `L` and `H`, and `data.table` will recycle those values across rows, silently mislabeling the summaries. Assign a single output label explicitly when multiple labels are present instead of recycling a multi-value vector. R/dataProcess.R [468-472] result = unique(single_protein[, .(Protein = PROTEIN, RUN = RUN)]) extracted_values = get_linear_summary(single_protein, cf, counts, label, cov_mat) result = cbind(result, extracted_values) -result[, LABEL := unique(single_protein$LABEL)] +result[, LABEL := if (data.table::uniqueN(single_protein$LABEL) == 1L) { + unique(single_protein$LABEL) +} else { + "L" +}] Suggestion importance[1-10]: 7 __ Why: This is a valid correctness issue because `unique(single_protein$LABEL)` can contain more than one value, which can misassign or recycle labels in `result`. Setting a single explicit `LABEL` for multi-label summaries avoids bad merges later in `MSstatsSummarizationOutput`.	Medium

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/dataProcess.R`:
- Around line 437-442: The assignment LABEL := unique(single_protein$LABEL)
fails when single_protein contains both H and L (labeled-reference mode) because
unique(...) returns length-2; update the summarization so that when using the
linear SRM summarization (the block computing result from single_protein and
assigning LogIntensities/Protein/LABEL/Variance) you first filter single_protein
to only the L (light/reference) rows before aggregating, or explicitly select
the single LABEL value per RUN (e.g., take LABEL[which.min(...) or LABEL[1]
after filtering]) so that LABEL is scalar per run; apply the same change to the
corresponding block around lines 468–472 to ensure run-level result tables
contain only L rows and a single LABEL value.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7f130e84-bde1-4f2b-a228-88e07cd4798f

📥 Commits

Reviewing files that changed from the base of the PR and between 5b5042c and 654ac41.

📒 Files selected for processing (8)

R/dataProcess.R
R/utils_output.R
R/utils_summarization.R
R/utils_summarization_prepare.R
inst/tinytest/test_utils_summarization.R
inst/tinytest/test_utils_summarization_prepare.R
man/dot-fitTukey.Rd
man/dot-runTukey.Rd

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

R/dataProcess.R (1)

446-450: ⚠️ Potential issue | 🔴 Critical

Use a scalar output label in linear labeled-reference mode.

When is_labeled_reference is TRUE, single_protein still contains both H and L rows, so unique(single_protein$LABEL) is not scalar. Lines 449 and 482 can therefore fail at assignment time or stamp the wrong label onto the run-level result. Derive the output label from the non-reference rows once, or filter to those rows before building result.

Proposed fix

+    output_label = if (is_labeled_reference) {
+        unique(single_protein[!is_labeled_ref, LABEL])
+    } else {
+        unique(single_protein$LABEL)
+    }
+
     if (is_single_feature) {
         result = single_protein[, .(LogIntensities = mean(newABUNDANCE)), by = RUN]
         result[, Protein := unique(single_protein$PROTEIN)]
-        result[, LABEL := unique(single_protein$LABEL)]
+        result[, LABEL := output_label]
         result[, Variance := NA_real_]
         setcolorder(result, c("Protein", "RUN", "LogIntensities", "Variance"))
@@
             result = unique(single_protein[, .(Protein = PROTEIN, RUN = RUN)])
             extracted_values = get_linear_summary(single_protein, cf,
                                                   counts, label, cov_mat)
             result = cbind(result, extracted_values)
-            result[, LABEL := unique(single_protein$LABEL)]
+            result[, LABEL := output_label]
         }

Also applies to: 478-482

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@R/dataProcess.R` around lines 446 - 450, When is_labeled_reference is TRUE,
unique(single_protein$LABEL) can return multiple values because single_protein
contains both H and L rows; derive the scalar LABEL from only the non-reference
rows (e.g., filter single_protein where REF flag is false or where LABEL !=
reference label) before assigning to result (used in the block under
is_single_feature and the similar block around lines 478-482). Locate the
assignments to result[, LABEL := unique(single_protein$LABEL)] and replace them
with a scalar computed from the filtered rows (e.g., selected_label <-
unique(single_protein[non_reference_rows]$LABEL); then assign result[, LABEL :=
selected_label]) so the LABEL is always a single value.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/dataProcess.R`:
- Around line 583-585: The code is hard-coding "L" into the LABEL column when
result lacks LABEL; change this to use the source protein's label instead: set
LABEL from single_protein$LABEL (e.g., result[, LABEL := single_protein$LABEL])
so that .runTukey()'s unlabeled outputs inherit the correct label; only fall
back to the hard-coded "L" when you are explicitly in the labeled-reference mode
(check whatever flag or parameter your pipeline uses for labeled-reference and
branch there).

---

Duplicate comments:
In `@R/dataProcess.R`:
- Around line 446-450: When is_labeled_reference is TRUE,
unique(single_protein$LABEL) can return multiple values because single_protein
contains both H and L rows; derive the scalar LABEL from only the non-reference
rows (e.g., filter single_protein where REF flag is false or where LABEL !=
reference label) before assigning to result (used in the block under
is_single_feature and the similar block around lines 478-482). Locate the
assignments to result[, LABEL := unique(single_protein$LABEL)] and replace them
with a scalar computed from the filtered rows (e.g., selected_label <-
unique(single_protein[non_reference_rows]$LABEL); then assign result[, LABEL :=
selected_label]) so the LABEL is always a single value.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e9cbb4f1-66f7-4cef-93b9-ffaa3514f5d5

📥 Commits

Reviewing files that changed from the base of the PR and between 654ac41 and d861d0d.

📒 Files selected for processing (5)

R/dataProcess.R
R/utils_summarization_prepare.R
man/dot-prepareLinear.Rd
man/dot-prepareSummary.Rd
man/dot-prepareTMP.Rd

✅ Files skipped from review due to trivial changes (2)

man/dot-prepareSummary.Rd
man/dot-prepareLinear.Rd

coderabbitai

♻️ Duplicate comments (1)

R/dataProcess.R (1)

447-452: ⚠️ Potential issue | 🟠 Major

Fix LABEL assignment in linear summarization when is_labeled_reference=TRUE.

When is_labeled_reference=TRUE, data is split by PROTEIN only (line 303-304), so single_protein contains both H and L rows. At line 449, unique(single_protein$LABEL) returns a length-2 vector c("H", "L"), which will cause a data.table assignment error when assigned to the scalar LABEL column.

The same issue exists at line 482 for the multi-feature case.

Proposed fix

+    output_label = if (data.table::uniqueN(single_protein$LABEL) > 1L) "L" else unique(single_protein$LABEL)
+
     if (is_single_feature) {
         result = single_protein[, .(LogIntensities = mean(newABUNDANCE)), by = RUN]
         result[, Protein := unique(single_protein$PROTEIN)]
-        result[, LABEL := unique(single_protein$LABEL)]
+        result[, LABEL := output_label]
         result[, Variance := NA_real_]

And similarly for line 482:

             result = cbind(result, extracted_values)
-            result[, LABEL := unique(single_protein$LABEL)]
+            result[, LABEL := output_label]
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@R/dataProcess.R` around lines 447 - 452, The LABEL assignment fails when
is_labeled_reference=TRUE because single_protein contains both "H" and "L" so
unique(single_protein$LABEL) returns length>1; update the LABEL assignment in
the linear summarization block (after result = single_protein[, .(LogIntensities
= mean(newABUNDANCE)), by = RUN]) to guard against multiple labels by selecting
a single value (e.g., LABEL := unique(single_protein$LABEL)[1]) or otherwise
handling the multi-value case (e.g., set NA_character_ or collapse values) and
apply the same defensive change to the analogous multi-feature summarization
block (the code around the second LABEL assignment).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@R/dataProcess.R`:
- Around line 447-452: The LABEL assignment fails when is_labeled_reference=TRUE
because single_protein contains both "H" and "L" so unique(single_protein$LABEL)
returns length>1; update the LABEL assignment in the linear summarization block
(after result = single_protein[, .(LogIntensities = mean(newABUNDANCE)), by =
RUN]) to guard against multiple labels by selecting a single value (e.g., LABEL
:= unique(single_protein$LABEL)[1]) or otherwise handling the multi-value case
(e.g., set NA_character_ or collapse values) and apply the same defensive change
to the analogous multi-feature summarization block (the code around the second
LABEL assignment).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 87533bf5-a158-4d4f-93ff-f7c5f33e7caf

📥 Commits

Reviewing files that changed from the base of the PR and between d861d0d and f59c26a.

📒 Files selected for processing (4)

R/dataProcess.R
R/utils_summarization_prepare.R
inst/tinytest/test_dataProcess.R
inst/tinytest/test_utils_summarization_prepare.R

🚧 Files skipped from review as they are similar to previous changes (1)

inst/tinytest/test_utils_summarization_prepare.R

tonywu1999 and others added 4 commits April 13, 2026 13:18

fix is_labeled_ref

5202b2c

github-actions Bot added the Review effort 4/5 label Apr 13, 2026

update unit tests

654ac41

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread R/dataProcess.R

tonywu1999 added 3 commits April 13, 2026 18:23

fix grouping based on srm or not

c594c19

fix srm discrepancies

a34440f

fix is_labeled_ref

d861d0d

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread R/dataProcess.R Outdated

tonywu1999 added 6 commits April 13, 2026 19:52

add unit tests

3fd691c

make unit test more robust

833427b

fix tests

9c9d7b5

fix unit tests

78f678a

remove dead code for .prepareLinear

f59c26a

update docs

cadbca0

tonywu1999 commented Apr 14, 2026

View reviewed changes

Comment thread R/dataProcess.R Outdated

tonywu1999 commented Apr 14, 2026

View reviewed changes

Comment thread R/dataProcess.R Outdated

coderabbitai Bot reviewed Apr 14, 2026

View reviewed changes

tonywu1999 added 2 commits April 14, 2026 10:27

fix summarization with labels

cc151ef

make fixes with labeling

14975f5

tonywu1999 changed the title ~~Feat turnover 4~~ feat(protein turnover): per-label statistics and multi-label summarization Apr 14, 2026

tonywu1999 added 2 commits April 14, 2026 14:52

add DDARawData test

b9e28eb

all tests pass

eca8c92

tonywu1999 merged commit 941aa84 into devel Apr 14, 2026
2 checks passed

tonywu1999 deleted the feat-turnover-4 branch April 14, 2026 19:24

coderabbitai Bot mentioned this pull request Apr 24, 2026

docs(impute): Update documentation w.r.t. censoredInt and MBimpute #204

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(protein turnover): per-label statistics and multi-label summarization#193

feat(protein turnover): per-label statistics and multi-label summarization#193
tonywu1999 merged 18 commits intodevelfrom
feat-turnover-4

tonywu1999 commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tonywu1999 commented Apr 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Motivation and context / Short solution summary

Detailed changes (by file / area)

Unit tests added or modified (summary)

Coding guidelines / issues observed

Uh oh!

coderabbitai Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented Apr 13, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions Bot commented Apr 13, 2026

PR Code Suggestions ✨

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tonywu1999 commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading