add fraction matching based on only the light / na labels by tonywu1999 · Pull Request #125 · Vitek-Lab/MSstatsConvert

tonywu1999 · 2026-04-07T22:19:44Z

Motivation and Context

This PR refines fraction-matching logic so fraction selection for a feature is driven by light (IsotopeLabelType == "L") and NA observations only, ignoring heavy ("H") observations when deciding which fraction to pick. This ensures fraction choice reflects the label-free (or light/NA) signal — important for turnover analyses where the light peptide abundance should determine the chosen fraction — while still retaining H rows from the selected fraction in the final output. If a feature lacks any L/NA observations, the logic falls back to H observations for measurement counts and mean-abundance tie-breaking.

Detailed Changes

R/utils_fractions.R
- .removeOverlappingFeatures()
  - Compute measurement counts per (feature, Fraction) using only rows where IsotopeLabelType == "L" or is.na(IsotopeLabelType), and Intensity is non-missing and > 0.
  - For feature–fraction groups with zero L/NA observations, recompute and append n_obs derived from H rows to the measurement_count used to select is_max (fallback behavior).
  - Prevent dropping features solely because they lack L/NA rows; ensure H-derived counts are considered only when L/NA are absent.
- .resolveFractionTies()
  - When resolving ties among fractions with equal maximum measurement counts, compute mean_abundance using only L/NA rows first.
  - For tied features that lack L/NA rows, include mean_abundance computed from H rows as a fallback and combine appropriately before selecting the fraction with the highest mean_abundance.
inst/tinytest/test_fractions.R
- Added IsotopeLabelType = "L" to existing fixtures and introduced new fixtures:
  - fractionated_lh (mix of L and H, plus NA-only cases)
  - fractionated_h_only and fractionated_h_tied (H-only scenarios to validate fallback)
- Added assertions verifying:
  - Fraction selection uses only L/NA observations (H does not influence choice).
  - Final output retains both L and H rows for the selected fraction.
  - NA-only features are resolved using NA observation counts.
  - When only H rows exist, selection falls back to H-based counts and breaks ties by higher mean Intensity.
  - Full-dataset identity checks for the expected subset of selected rows.
man/dot-getCorrectFraction.Rd
- Removed the generated documentation file for internal helper .getCorrectFraction (documentation-only deletion; no runtime change).
man/dot-resolveFractionTies.Rd
- Added documentation for the internal function .resolveFractionTies describing purpose, usage, arguments, and return value.

Unit Tests Added or Modified

inst/tinytest/test_fractions.R
- New and extended test cases cover:
  - Mixed L/H features: confirm fraction chosen by L/NA counts and ties by mean L/NA intensity; ensure retained output includes both labels for the chosen fraction.
  - NA-only features: confirm selection based on NA observations.
  - H-only features and ties: confirm fallback to H-derived counts and mean-intensity tie-breaking.
  - Full-dataset tests validating the final selected-row subset matches expectations.

Coding Guidelines / Violations

No coding guideline violations identified. Changes adhere to existing code patterns and conventions (data.table usage, internal function naming, tests in tinytest).

coderabbitai · 2026-04-07T22:19:51Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3f568781-89d4-4a12-b324-954bd54e6e43

📥 Commits

Reviewing files that changed from the base of the PR and between 4e6ee04 and 4d8bd68.

📒 Files selected for processing (2)

R/utils_fractions.R
inst/tinytest/test_fractions.R

🚧 Files skipped from review as they are similar to previous changes (1)

R/utils_fractions.R

📝 Walkthrough

Walkthrough

Restricted overlap-counting and tie-breaking in fraction selection to use only rows with IsotopeLabelType == "L" or NA (and non-missing, >0 Intensity); when no such rows exist for a feature, fall back to using "H" rows to compute measurement counts or mean abundances. Tests and documentation adjusted accordingly.

Changes

Cohort / File(s)	Summary
Core logic `R/utils_fractions.R`	Updated `.removeOverlappingFeatures()` and `.resolveFractionTies()` to compute run counts and tie-breaking mean abundances first from rows where `IsotopeLabelType == "L"` or `is.na(IsotopeLabelType)` (with non-missing, positive `Intensity`); when none exist for a feature, incorporate `"H"` rows as a fallback for measurement counts or mean abundance before selecting the winning `Fraction`.
Tests `inst/tinytest/test_fractions.R`	Added `IsotopeLabelType = "L"` to fixtures, introduced `fractionated_lh`, `fractionated_h_only`, and `fractionated_h_tied` cases, and expanded assertions to verify selection uses `L`/`NA` observations first, retains both `L` and `H` rows for chosen fractions, and falls back to `H`-based counts/means when needed.
Documentation `man/dot-getCorrectFraction.Rd`, `man/dot-resolveFractionTies.Rd`	Removed generated `.Rd` for `.getCorrectFraction`; added `man/dot-resolveFractionTies.Rd` documenting `.resolveFractionTies(input, max_fractions)` and its tie-resolution behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Fix fraction efficient #126 — Modifies the same fraction-selection logic in R/utils_fractions.R, adjusting how run counts and mean intensities are computed for overlapping fractions and tie-breaking.

Poem

🐰 I nibble through fractions, light and neat,
I count the L's where intensities meet,
If none are found, I peek at H's tune,
Then pick the slice that sings the loudest rune. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description is entirely missing, lacking all required sections: motivation/context, detailed changes list, testing information, and pre-review checklist.	Add a comprehensive description following the template, including motivation for the change, detailed bullet-point list of modifications, testing approach, and completed pre-review checklist.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: updating fraction matching logic to prioritize light and NA isotope labels in the decision-making process.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat-turnover

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

R/utils_fractions.R (1)

279-280: ⚠️ Potential issue | 🟠 Major

Prevent silent feature drops when no L/NA rows qualify.

Because fraction_map is derived from the filtered subset, nomatch = 0 can remove entire features (e.g., H-only rows) without an explicit warning/error.

💡 Proposed safeguard

     if (data.table::uniqueN(input$Fraction) > 1) {
         measurement_count = input[
             (IsotopeLabelType == "L" | is.na(IsotopeLabelType)) & 
             !is.na(Intensity) & Intensity > 0,
             .(n_obs = uniqueN(Run)),
             by = .(feature, Fraction)
         ]
+        missing_anchor_features = setdiff(unique(input$feature), unique(measurement_count$feature))
+        if (length(missing_anchor_features) > 0) {
+            msg = paste(
+                "** No L/NA positive-intensity observations for feature(s):",
+                paste(missing_anchor_features, collapse = ", ")
+            )
+            getOption("MSstatsLog")("ERROR", msg)
+            stop(msg)
+        }
         measurement_count[, is_max := n_obs == max(n_obs), by = "feature"]
         max_fractions = measurement_count[(is_max)]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@R/utils_fractions.R` around lines 279 - 280, The current filtering uses
fraction_map from .resolveFractionTies and then does input = input[fraction_map,
on = .(feature, Fraction), nomatch = 0], which can silently drop entire features
when no L/NA rows qualify; change the logic to detect features present in input
but missing in fraction_map (compute setdiff between unique(input$feature) and
unique(fraction_map$feature)), and for any missing features either emit a
warning naming those features or fall back to keeping their original rows (i.e.,
do not apply the nomatch = 0 drop for those features); implement this check
around the call to .resolveFractionTies and the subsequent join/filter so that
fraction_map, input, and feature are used to decide whether to warn or to
preserve rows instead of silently dropping them.

🧹 Nitpick comments (1)

man/dot-resolveFractionTies.Rd (1)
22-25: Document the isotope/intensity eligibility used in tie-breaking.

The description should mention that mean intensity is computed only from rows with IsotopeLabelType == "L" or NA, and Intensity > 0, to match implementation behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@man/dot-resolveFractionTies.Rd` around lines 22 - 25, Update the
documentation for resolveFractionTies to explicitly state the tie-breaker
computes mean intensity only over measurements where IsotopeLabelType is "L" or
NA and Intensity > 0; mention that fractions tied by count are resolved by
selecting the fraction with the highest mean intensity computed using only those
eligible rows to match the implementation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/utils_fractions.R`:
- Around line 271-274: The subset step assumes IsotopeLabelType exists; ensure
it is present and normalized before filtering by explicitly adding a default and
normalization: in the preprocessing path that leads into
.removeOverlappingFeatures (or right before the block using IsotopeLabelType,
Intensity, feature, Fraction, uniqueN(Run)), add code to create IsotopeLabelType
if missing (e.g., set IsotopeLabelType := NA_character_ when !"IsotopeLabelType"
%in% names(dt)) and normalize unexpected values to NA (e.g., convert non "L"/"H"
entries to NA) so the condition (IsotopeLabelType == "L" |
is.na(IsotopeLabelType)) & !is.na(Intensity) & Intensity > 0 works without
error. Ensure this change is applied where inputs are shaped
(preprocessing/clean functions) or immediately before the grouping that computes
.(n_obs = uniqueN(Run)) by .(feature, Fraction).

---

Outside diff comments:
In `@R/utils_fractions.R`:
- Around line 279-280: The current filtering uses fraction_map from
.resolveFractionTies and then does input = input[fraction_map, on = .(feature,
Fraction), nomatch = 0], which can silently drop entire features when no L/NA
rows qualify; change the logic to detect features present in input but missing
in fraction_map (compute setdiff between unique(input$feature) and
unique(fraction_map$feature)), and for any missing features either emit a
warning naming those features or fall back to keeping their original rows (i.e.,
do not apply the nomatch = 0 drop for those features); implement this check
around the call to .resolveFractionTies and the subsequent join/filter so that
fraction_map, input, and feature are used to decide whether to warn or to
preserve rows instead of silently dropping them.

---

Nitpick comments:
In `@man/dot-resolveFractionTies.Rd`:
- Around line 22-25: Update the documentation for resolveFractionTies to
explicitly state the tie-breaker computes mean intensity only over measurements
where IsotopeLabelType is "L" or NA and Intensity > 0; mention that fractions
tied by count are resolved by selecting the fraction with the highest mean
intensity computed using only those eligible rows to match the implementation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9ebf79ff-e76c-4873-9c84-95e8e0d4e4ea

📥 Commits

Reviewing files that changed from the base of the PR and between 077b51f and 4e6ee04.

📒 Files selected for processing (4)

R/utils_fractions.R
inst/tinytest/test_fractions.R
man/dot-getCorrectFraction.Rd
man/dot-resolveFractionTies.Rd

💤 Files with no reviewable changes (1)

man/dot-getCorrectFraction.Rd

tonywu1999 added 2 commits April 10, 2026 13:14

add fraction matching based on only the light / na labels

8897ac6

add unit tests

4e6ee04

tonywu1999 force-pushed the feat-turnover branch from f831151 to 4e6ee04 Compare April 10, 2026 17:39

tonywu1999 marked this pull request as ready for review April 10, 2026 17:39

coderabbitai Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread R/utils_fractions.R Outdated

account for heavy labels without light labels

4d8bd68

tonywu1999 merged commit 0eae2d3 into devel Apr 10, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add fraction matching based on only the light / na labels#125

add fraction matching based on only the light / na labels#125
tonywu1999 merged 3 commits intodevelfrom
feat-turnover

tonywu1999 commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tonywu1999 commented Apr 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Detailed Changes

Unit Tests Added or Modified

Coding Guidelines / Violations

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tonywu1999 commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading