Skip to content

Fix fraction efficient#126

Merged
tonywu1999 merged 5 commits intodevelfrom
fix-fraction-efficient
Apr 10, 2026
Merged

Fix fraction efficient#126
tonywu1999 merged 5 commits intodevelfrom
fix-fraction-efficient

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented Apr 9, 2026

Motivation and Context

Please include relevant motivation and context of the problem along with a short summary of the solution.

Changes

Please provide a detailed bullet point list of your changes.

Testing

Please describe any unit tests you added or modified to verify your changes.

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

Motivation and Context

The MSstatsConvert package processes mass spectrometry-based proteomics data and handles cases where peptide features are measured across multiple fractions. When a feature appears in multiple fractions within a sample, the code must select which fraction's measurement to retain. The previous implementation used a less efficient approach that lacked clear separation of concerns between fraction selection and tie-resolution logic. This PR improves efficiency and code clarity by refactoring the fraction selection mechanism.

Solution

The refactoring centralizes tie-resolution logic into a dedicated function and makes the fraction selection process more efficient. Instead of applying tie-break logic during iteration, the new approach:

  1. Pre-computes fractions with maximum observation counts per feature (based on unique Run counts where Intensity > 0)
  2. Identifies which features have tied fractions (multiple fractions sharing the max count)
  3. For tied features, selects the fraction with the highest mean Intensity
  4. For non-tied features, selects the first available fraction with max observations
  5. Uses data.table joins to filter the input to only retain the selected fractions

Detailed Changes

R/utils_fractions.R

  • Refactored .removeOverlappingFeatures():

    • Now computes candidate fractions based on uniqueN(Run) among rows with !is.na(Intensity) & Intensity > 0 per feature-Fraction combination
    • Creates a fraction_map via .resolveFractionTies() helper function
    • Filters input using a single data.table join: input[fraction_map, on=.(feature, Fraction), nomatch=0]
    • Removed intermediate fraction_keep column logic that previously relied on row-by-row evaluation
  • Replaced .getCorrectFraction() with .resolveFractionTies(input, max_fractions):

    • New function signature accepts both input data and pre-computed max_fractions
    • Identifies tie_features where multiple fractions share maximum n_obs per feature
    • For tied features: selects fraction with highest mean Intensity using which.max(mean_abundance) grouped by feature and Fraction
    • For non-tied features: selects first fraction in max_fractions per feature
    • Returns results via rbind() combining both groups
    • Added documentation explaining the tie-resolution logic
  • Lines changed: +35/-24

inst/tinytest/test_fractions.R

  • Updated test logic: Instead of calling .getCorrectFraction() directly on individual features, tests now:

    • Call .removeOverlappingFeatures() on per-feature subsets
    • Extract the resulting Fraction values
    • Assert the unique fraction equals the expected value (2 for both feature "A" and feature "B" tie case)
  • Test coverage maintained: The test continues to validate .removeOverlappingFeatures() on the full dataset using expect_identical()

  • Lines changed: +9/-5

Unit Tests

The test suite verifies the refactored logic through:

  1. Observation count selection (lines 70-74): Validates that the fraction with more observations (Run count) wins when comparing fractions for the same feature
  2. Mean intensity tie-breaking (lines 75-79): Validates that when two fractions have equal observation counts, the fraction with higher average Intensity is selected
  3. Full dataset processing (lines 80-84): Confirms that the tie-resolution logic correctly processes the complete fractionated dataset and retains only rows from the selected fraction (Fraction == 2)

All tests continue to pass with the new implementation, confirming backward compatibility of the selection behavior.

Coding Guidelines

No coding guideline violations identified. The code:

  • Follows R/data.table idioms and conventions used throughout the package
  • Includes comprehensive roxygen documentation for the new .resolveFractionTies() function
  • Maintains clear function naming conventions with dot-prefix for internal functions
  • Uses appropriate data.table syntax (:=, .N, .SD, by=, on=)
  • Properly handles NULL values and NA/zero filtering conditions

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 9, 2026

Warning

Rate limit exceeded

@tonywu1999 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 9 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 2 minutes and 9 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2e6925b8-2355-4ea1-b6ad-8fb00fb34b41

📥 Commits

Reviewing files that changed from the base of the PR and between 8aa5ec5 and 9c3990d.

📒 Files selected for processing (2)
  • R/utils_fractions.R
  • inst/tinytest/test_fractions.R
📝 Walkthrough

Walkthrough

Refactored fraction tie-resolution logic in MSstatsConvert by replacing .getCorrectFraction with .resolveFractionTies(input, max_fractions). The new function identifies features with tied maximum observation counts and selects fractions based on highest mean intensity. Tests updated accordingly to validate the revised approach.

Changes

Cohort / File(s) Summary
Fraction Tie-Resolution Refactoring
R/utils_fractions.R
Replaced .getCorrectFraction with .resolveFractionTies(input, max_fractions). New function identifies tie_features where multiple fractions share max n_obs, selects fractions with highest mean Intensity, and combines with non-tied features. Updated .removeOverlappingFeatures to use new tie-breaking logic via fraction_map.
Test Updates
inst/tinytest/test_fractions.R
Updated tests to call .removeOverlappingFeatures() instead of .getCorrectFraction() directly. Tests now derive expected fractions from the refactored function and validate results for both tied and non-tied feature scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 Ties once tangled, now resolved with grace,
Where fractions dance and intensities embrace—
Mean abundance leads the way,
New logic blooms to save the day! 🌸

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely incomplete; it only contains the template structure with empty sections and no actual motivation, implementation details, testing information, or checklist items completed. Fill in all required sections: provide motivation/context for the efficiency fix, detail the specific changes made (new .resolveFractionTies function, refactored .removeOverlappingFeatures logic), describe unit tests added/modified, and check the checklist items.
Title check ❓ Inconclusive The title 'Fix fraction efficient' is vague and grammatically awkward; it does not clearly convey the specific change (refactoring fraction tie-resolution logic with a new helper function). Use a more specific and grammatically correct title such as 'Refactor fraction tie-resolution logic into dedicated helper function' or 'Improve fraction selection efficiency'.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-fraction-efficient

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tonywu1999 tonywu1999 marked this pull request as ready for review April 10, 2026 13:20
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
inst/tinytest/test_fractions.R (1)

71-79: Add a three-fraction regression for tie resolution.

These checks only exercise two-fraction cases. Please add one where two fractions tie on n_obs and a third has fewer observations but a higher mean intensity; that’s the edge case the new resolver can currently pick incorrectly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@inst/tinytest/test_fractions.R` around lines 71 - 79, Add a new unit test
that covers the three-fraction edge case for
MSstatsConvert:::.removeOverlappingFeatures: construct a fractionated dataset
for a new feature (e.g., feature "C") with three fraction groups where two
fractions tie on n_obs and the third has fewer observations but a higher mean
intensity, call .removeOverlappingFeatures(fractionated[feature == "C"]) and
assert that unique(...$Fraction) is the expected fraction (the resolver should
prefer the fraction among the tied n_obs with the higher mean intensity);
reference the existing test pattern using fractionated,
.removeOverlappingFeatures, and $Fraction to mirror the other expect_equal
checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/utils_fractions.R`:
- Around line 298-303: avg_abundance currently computes mean(Intensity) across
all fractions for tie_features which can let a fraction with higher mean but
fewer observations win; update the logic to compute n_obs per (feature,
Fraction) alongside mean_abundance, then for each feature restrict candidates to
only those Fraction rows where n_obs equals the per-feature maximum n_obs (i.e.,
keep only fractions that tied for the top observation count) before selecting
the best by mean_abundance; adjust the code that builds avg_abundance and
best_tied (referenced symbols: avg_abundance, best_tied, tie_features, Fraction,
Intensity) so best_tied = avg_abundance[ n_obs == max(n_obs) ,
.SD[which.max(mean_abundance)], by = "feature"] (or equivalent filtering) to
ensure tie-breaking only occurs among the max-n_obs fractions.

---

Nitpick comments:
In `@inst/tinytest/test_fractions.R`:
- Around line 71-79: Add a new unit test that covers the three-fraction edge
case for MSstatsConvert:::.removeOverlappingFeatures: construct a fractionated
dataset for a new feature (e.g., feature "C") with three fraction groups where
two fractions tie on n_obs and the third has fewer observations but a higher
mean intensity, call .removeOverlappingFeatures(fractionated[feature == "C"])
and assert that unique(...$Fraction) is the expected fraction (the resolver
should prefer the fraction among the tied n_obs with the higher mean intensity);
reference the existing test pattern using fractionated,
.removeOverlappingFeatures, and $Fraction to mirror the other expect_equal
checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ccb74428-a55b-44bb-9041-751ffab5adea

📥 Commits

Reviewing files that changed from the base of the PR and between a780511 and 8aa5ec5.

📒 Files selected for processing (2)
  • R/utils_fractions.R
  • inst/tinytest/test_fractions.R

Comment thread R/utils_fractions.R
@tonywu1999 tonywu1999 merged commit 87acf56 into devel Apr 10, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the fix-fraction-efficient branch April 10, 2026 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant