refactor(diann): Integrate MSstatsImport and MSstatsClean functions into DIANN converter by tonywu1999 · Pull Request #10 · Vitek-Lab/MSstatsBig

tonywu1999 · 2026-02-06T17:41:02Z

Summary by CodeRabbit

New Features
- DIANN processing now uses MSstats-based chunked conversion for streamlined, scalable cleaning and writing.
Bug Fixes
- Default quantification column updated for improved DIANN compatibility and more reliable preprocessing.
Documentation
- Expanded guidance for DIANN version compatibility and added docs for chunked cleaning and file-writing utilities.
Tests
- Updated unit tests to align with new defaults and cleaned output expectations.

…nto DIANN converter

coderabbitai · 2026-02-06T17:41:16Z

📝 Walkthrough

Walkthrough

The DIANN cleaning pipeline now delegates per-chunk conversion and cleaning to MSstatsConvert's MSstatsImport and MSstatsClean. A generic chunk-writing helper was added and used by DIANN and Spectronaut cleaners. The default DIANN quantification column was changed to "FragmentQuantCorrected". Documentation and tests were updated.

Changes

Cohort / File(s)	Summary
NAMESPACE & DIANN pipeline `NAMESPACE`, `R/clean_DIANN.R`	Added `importFrom(MSstatsConvert, MSstatsImport)` and `importFrom(MSstatsConvert, MSstatsClean)`. Replaced internal DIANN multi-step cleaning with calls to `MSstatsImport` → `MSstatsClean`. Updated `reduceBigDIANN` default `quantificationColumn` to `"FragmentQuantCorrected"` and use new chunk write flow.
Chunk I/O helper & Spectronaut `R/utils.R`, `R/clean_spectronaut.R`	Introduced internal helper `.writeChunkToFile(input, output_path, pos)` to centralize CSV append/overwrite logic. Spectronaut chunk write now calls this helper instead of inline conditional writes.
Converters & parameter defaults `R/converters.R`, `man/bigDIANNtoMSstatsFormat.Rd`	Changed `bigDIANNtoMSstatsFormat` default `quantificationColumn` from `"Fragment.Quant.Corrected"` to `"FragmentQuantCorrected"`. Updated documentation to describe DIANN version options and new defaults.
Documentation additions/updates `man/reduceBigDIANN.Rd`, `man/cleanDIANNChunk.Rd`, `man/dot-writeChunkToFile.Rd`, `man/bigDIANNtoMSstatsFormat.Rd`	Added Rd docs for `reduceBigDIANN`, `cleanDIANNChunk`, and `.writeChunkToFile`. Updated bigDIANN doc text to reflect default quantification column and DIANN version guidance.
Tests `tests/testthat/test-diann_converter.R`	Updated tests to use `"FragmentQuantCorrected"` and adjusted expected peptide sequence assertions; removed some column presence checks (ProductCharge, IsotopeLabelType, PeptideModifiedSequence) where no longer applicable.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant reduceBigDIANN as reduceBigDIANN
    participant cleanDIANNChunk as cleanDIANNChunk
    participant MSstatsImport as MSstatsImport\n(MSstatsConvert)
    participant MSstatsClean as MSstatsClean\n(MSstatsConvert)
    participant FileIO as .writeChunkToFile

    User->>reduceBigDIANN: provide input_file, output_path, pos chunks
    reduceBigDIANN->>cleanDIANNChunk: provide chunk data
    cleanDIANNChunk->>MSstatsImport: MSstatsImport(chunk)
    MSstatsImport-->>cleanDIANNChunk: imported MSstats-format data
    cleanDIANNChunk->>MSstatsClean: MSstatsClean(imported data)
    MSstatsClean-->>cleanDIANNChunk: cleaned data
    cleanDIANNChunk->>FileIO: .writeChunkToFile(cleaned, output_path, pos)
    FileIO-->>cleanDIANNChunk: NULL (written)
    cleanDIANNChunk-->>reduceBigDIANN: chunk processed
    reduceBigDIANN-->>User: processing complete (file written)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Implementing and Testing DIANN converter for MSstatsBIG. #9: Modifies the DIANN conversion/cleaning pipeline and imports of MSstatsConvert (same files/functions: R/clean_DIANN.R, NAMESPACE, R/converters.R).
Feature anomaly #6: Changes per-chunk processing/writing behavior in R/clean_spectronaut.R, overlapping with the new .writeChunkToFile abstraction.

Poem

🐇 In tidy rows my carrots lay, chunked neat and bright,

MSstats lends a paw to clean them through the night,
I hop, I nibble, writing chunks with care and cheer,
New defaults, new helpers — the path is clear! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main refactoring: integrating MSstatsImport and MSstatsClean functions into the DIANN converter pipeline.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch refactor-clean

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

R/converters.R (2)
162-163: ⚠️ Potential issue | 🟡 Minor

Documentation still references the old default value.

The @param for quantificationColumn says 'Fragment.Quant.Corrected'(default) but the actual default on line 174 is now "FragmentQuantCorrected". Update the doc to match.
Proposed fix
-#' `@param` quantificationColumn Use 'Fragment.Quant.Corrected'(default) column for quantified intensities for DIANN 1.8.x.
-#' Use 'FragmentQuantRaw' for quantified intensities for DIANN 1.9.x. 
+#' `@param` quantificationColumn Use 'FragmentQuantCorrected'(default) column for quantified intensities.
+#' Use 'FragmentQuantRaw' for quantified intensities for DIANN 1.9.x.
185-196: 🛠️ Refactor suggestion | 🟠 Major

Use file.path() for intermediate file paths to handle directory-containing output_file_name.

When output_file_name contains directory components (e.g., "results/out.csv"), paste0("reduce_output_", output_file_name) produces "reduce_output_results/out.csv" — a broken path. Use file.path(dirname(...), paste0("reduce_output_", basename(...))) to place the intermediate file alongside the final output.
Proposed fix
+  reduce_output_path <- file.path(dirname(output_file_name),
+                                  paste0("reduce_output_", basename(output_file_name)))
+
   # Reduce and clean the DIANN report file in chunks
   reduceBigDIANN(input_file, 
-                 paste0("reduce_output_", output_file_name),
+                 reduce_output_path,
                  MBR,
                  quantificationColumn)
   
   # Preprocess the cleaned data (feature selection, etc.)
   msstats_data <- MSstatsPreprocessBig(
-    paste0("reduce_output_", output_file_name),
+    reduce_output_path,
     output_file_name, backend, max_feature_count,
     filter_unique_peptides, aggregate_psms, filter_few_obs, 
     remove_annotation, calculateAnomalyScores, 
     anomalyModelFeatures, connection)
Based on learnings: "In R converters across the MSstatsBig package, replace direct concatenation with file names by constructing intermediate file paths using file.path(dirname(output_file_name), paste0("prefix_", basename(output_file_name))). This ensures the intermediate file is created in the correct directory even when output_file_name contains directories."

🤖 Fix all issues with AI agents

In `@man/bigDIANNtoMSstatsFormat.Rd`:
- Line 12: Update the roxygen `@param` description for quantificationColumn to
match the function default: change the referenced default from
'Fragment.Quant.Corrected' to 'FragmentQuantCorrected' so the documentation
aligns with the actual default in the converter function (see R/converters.R and
the function signature where quantificationColumn = "FragmentQuantCorrected").

In `@R/clean_DIANN.R`:
- Around line 51-58: cleanDIANNChunk declares global_qvalue_cutoff,
qvalue_cutoff, and pg_qvalue_cutoff but never applies them; update the function
to mirror cleanSpectronaut: before calling MSstatsClean, perform row-level
filtering on the DIANN input data using the Q.Value, Lib.Q.Value, and
Lib.PG.Q.Value columns according to qvalue_cutoff, global_qvalue_cutoff, and
pg_qvalue_cutoff respectively (only drop rows that exceed the provided cutoffs),
then pass the filtered data into MSstatsClean(input, MBR, quantificationColumn)
and continue to .writeChunkToFile; if you prefer not to implement filtering now,
remove those cutoff parameters from cleanDIANNChunk’s signature to avoid
misleading users.

🧹 Nitpick comments (3)

R/utils.R (1)
8-17: Clean helper; minor simplification possible.

The logic is correct. You could collapse the inner branch into a single call since the only difference is the append flag.
Optional simplification
 .writeChunkToFile <- function(input, output_path, pos) {
-    # Write to file
-    if (!is.null(pos)) {
-        if (pos == 1) {
-            readr::write_csv(input, file = output_path, append = FALSE)
-        } else {
-            readr::write_csv(input, file = output_path, append = TRUE)
-        }
-    }
+    if (!is.null(pos)) {
+        readr::write_csv(input, file = output_path, append = (pos != 1))
+    }
 }
tests/testthat/test-diann_converter.R (1)

144-175: End-to-end test cleanup mirrors production's paste0 path issue.

Line 174 uses the same paste0("reduce_output_", output_file) pattern as production code. This works here because output_file is a bare filename (line 146: basename(tempfile(...))), but it won't catch the path-concatenation bug flagged in R/converters.R. Consider adding a test case where output_file_name contains a directory component to guard against regressions.

R/clean_DIANN.R (1)

44-44: @importFrom placed on internal function instead of package-level collation.

The @importFrom MSstatsConvert MSstatsImport MSstatsClean directive here works, but it's common in R packages to consolidate namespace imports in a package-doc file (e.g., R/{package}-package.R) for easier maintenance. This is a minor style point — no functional impact.

…nto DIANN converter (#10)

tonywu1999 added 2 commits February 6, 2026 12:36

refactor(diann): Integrate MSstatsImport and MSstatsClean functions i…

10329cf

…nto DIANN converter

add dependencies and documentation

a947c97

tonywu1999 requested a review from Rudhik1904 February 6, 2026 17:41

coderabbitai Bot reviewed Feb 6, 2026

View reviewed changes

Comment thread man/bigDIANNtoMSstatsFormat.Rd

Comment thread R/clean_DIANN.R

update docs

f0d8ca4

tonywu1999 merged commit 2be67e0 into devel Feb 6, 2026
1 check passed

tonywu1999 deleted the refactor-clean branch February 6, 2026 17:57

tonywu1999 mentioned this pull request Feb 6, 2026

refactor(diann): move q-value filtering to MSstatsClean for DIANN Vitek-Lab/MSstatsConvert#116

Merged

3 tasks

coderabbitai Bot mentioned this pull request Feb 6, 2026

fix(diann): Add q-value filtering to DIANN big clean function #11

Merged

coderabbitai Bot mentioned this pull request Feb 14, 2026

Adding annotation file param to bigDIANNtoMSstatsFormat #12

Merged

tonywu1999 added a commit that referenced this pull request Feb 24, 2026

refactor(diann): Integrate MSstatsImport and MSstatsClean functions i…

336a1ac

…nto DIANN converter (#10)

coderabbitai Bot mentioned this pull request Mar 2, 2026

added anomaly model to MSstatsClean call #14

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(diann): Integrate MSstatsImport and MSstatsClean functions into DIANN converter#10

refactor(diann): Integrate MSstatsImport and MSstatsClean functions into DIANN converter#10
tonywu1999 merged 3 commits intodevelfrom
refactor-clean

tonywu1999 commented Feb 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Feb 6, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tonywu1999 commented Feb 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tonywu1999 commented Feb 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 6, 2026 •

edited

Loading