Skip to content

refactor(diann): Integrate MSstatsImport and MSstatsClean functions into DIANN converter#10

Merged
tonywu1999 merged 3 commits intodevelfrom
refactor-clean
Feb 6, 2026
Merged

refactor(diann): Integrate MSstatsImport and MSstatsClean functions into DIANN converter#10
tonywu1999 merged 3 commits intodevelfrom
refactor-clean

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented Feb 6, 2026

Summary by CodeRabbit

  • New Features

    • DIANN processing now uses MSstats-based chunked conversion for streamlined, scalable cleaning and writing.
  • Bug Fixes

    • Default quantification column updated for improved DIANN compatibility and more reliable preprocessing.
  • Documentation

    • Expanded guidance for DIANN version compatibility and added docs for chunked cleaning and file-writing utilities.
  • Tests

    • Updated unit tests to align with new defaults and cleaned output expectations.

@tonywu1999 tonywu1999 requested a review from Rudhik1904 February 6, 2026 17:41
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 6, 2026

📝 Walkthrough

Walkthrough

The DIANN cleaning pipeline now delegates per-chunk conversion and cleaning to MSstatsConvert's MSstatsImport and MSstatsClean. A generic chunk-writing helper was added and used by DIANN and Spectronaut cleaners. The default DIANN quantification column was changed to "FragmentQuantCorrected". Documentation and tests were updated.

Changes

Cohort / File(s) Summary
NAMESPACE & DIANN pipeline
NAMESPACE, R/clean_DIANN.R
Added importFrom(MSstatsConvert, MSstatsImport) and importFrom(MSstatsConvert, MSstatsClean). Replaced internal DIANN multi-step cleaning with calls to MSstatsImportMSstatsClean. Updated reduceBigDIANN default quantificationColumn to "FragmentQuantCorrected" and use new chunk write flow.
Chunk I/O helper & Spectronaut
R/utils.R, R/clean_spectronaut.R
Introduced internal helper .writeChunkToFile(input, output_path, pos) to centralize CSV append/overwrite logic. Spectronaut chunk write now calls this helper instead of inline conditional writes.
Converters & parameter defaults
R/converters.R, man/bigDIANNtoMSstatsFormat.Rd
Changed bigDIANNtoMSstatsFormat default quantificationColumn from "Fragment.Quant.Corrected" to "FragmentQuantCorrected". Updated documentation to describe DIANN version options and new defaults.
Documentation additions/updates
man/reduceBigDIANN.Rd, man/cleanDIANNChunk.Rd, man/dot-writeChunkToFile.Rd, man/bigDIANNtoMSstatsFormat.Rd
Added Rd docs for reduceBigDIANN, cleanDIANNChunk, and .writeChunkToFile. Updated bigDIANN doc text to reflect default quantification column and DIANN version guidance.
Tests
tests/testthat/test-diann_converter.R
Updated tests to use "FragmentQuantCorrected" and adjusted expected peptide sequence assertions; removed some column presence checks (ProductCharge, IsotopeLabelType, PeptideModifiedSequence) where no longer applicable.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant reduceBigDIANN as reduceBigDIANN
    participant cleanDIANNChunk as cleanDIANNChunk
    participant MSstatsImport as MSstatsImport\n(MSstatsConvert)
    participant MSstatsClean as MSstatsClean\n(MSstatsConvert)
    participant FileIO as .writeChunkToFile

    User->>reduceBigDIANN: provide input_file, output_path, pos chunks
    reduceBigDIANN->>cleanDIANNChunk: provide chunk data
    cleanDIANNChunk->>MSstatsImport: MSstatsImport(chunk)
    MSstatsImport-->>cleanDIANNChunk: imported MSstats-format data
    cleanDIANNChunk->>MSstatsClean: MSstatsClean(imported data)
    MSstatsClean-->>cleanDIANNChunk: cleaned data
    cleanDIANNChunk->>FileIO: .writeChunkToFile(cleaned, output_path, pos)
    FileIO-->>cleanDIANNChunk: NULL (written)
    cleanDIANNChunk-->>reduceBigDIANN: chunk processed
    reduceBigDIANN-->>User: processing complete (file written)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 In tidy rows my carrots lay, chunked neat and bright,

MSstats lends a paw to clean them through the night,
I hop, I nibble, writing chunks with care and cheer,
New defaults, new helpers — the path is clear! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main refactoring: integrating MSstatsImport and MSstatsClean functions into the DIANN converter pipeline.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor-clean

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
R/converters.R (2)

162-163: ⚠️ Potential issue | 🟡 Minor

Documentation still references the old default value.

The @param for quantificationColumn says 'Fragment.Quant.Corrected'(default) but the actual default on line 174 is now "FragmentQuantCorrected". Update the doc to match.

Proposed fix
-#' `@param` quantificationColumn Use 'Fragment.Quant.Corrected'(default) column for quantified intensities for DIANN 1.8.x.
-#' Use 'FragmentQuantRaw' for quantified intensities for DIANN 1.9.x. 
+#' `@param` quantificationColumn Use 'FragmentQuantCorrected'(default) column for quantified intensities.
+#' Use 'FragmentQuantRaw' for quantified intensities for DIANN 1.9.x.

185-196: 🛠️ Refactor suggestion | 🟠 Major

Use file.path() for intermediate file paths to handle directory-containing output_file_name.

When output_file_name contains directory components (e.g., "results/out.csv"), paste0("reduce_output_", output_file_name) produces "reduce_output_results/out.csv" — a broken path. Use file.path(dirname(...), paste0("reduce_output_", basename(...))) to place the intermediate file alongside the final output.

Proposed fix
+  reduce_output_path <- file.path(dirname(output_file_name),
+                                  paste0("reduce_output_", basename(output_file_name)))
+
   # Reduce and clean the DIANN report file in chunks
   reduceBigDIANN(input_file, 
-                 paste0("reduce_output_", output_file_name),
+                 reduce_output_path,
                  MBR,
                  quantificationColumn)
   
   # Preprocess the cleaned data (feature selection, etc.)
   msstats_data <- MSstatsPreprocessBig(
-    paste0("reduce_output_", output_file_name),
+    reduce_output_path,
     output_file_name, backend, max_feature_count,
     filter_unique_peptides, aggregate_psms, filter_few_obs, 
     remove_annotation, calculateAnomalyScores, 
     anomalyModelFeatures, connection)

Based on learnings: "In R converters across the MSstatsBig package, replace direct concatenation with file names by constructing intermediate file paths using file.path(dirname(output_file_name), paste0("prefix_", basename(output_file_name))). This ensures the intermediate file is created in the correct directory even when output_file_name contains directories."

🤖 Fix all issues with AI agents
In `@man/bigDIANNtoMSstatsFormat.Rd`:
- Line 12: Update the roxygen `@param` description for quantificationColumn to
match the function default: change the referenced default from
'Fragment.Quant.Corrected' to 'FragmentQuantCorrected' so the documentation
aligns with the actual default in the converter function (see R/converters.R and
the function signature where quantificationColumn = "FragmentQuantCorrected").

In `@R/clean_DIANN.R`:
- Around line 51-58: cleanDIANNChunk declares global_qvalue_cutoff,
qvalue_cutoff, and pg_qvalue_cutoff but never applies them; update the function
to mirror cleanSpectronaut: before calling MSstatsClean, perform row-level
filtering on the DIANN input data using the Q.Value, Lib.Q.Value, and
Lib.PG.Q.Value columns according to qvalue_cutoff, global_qvalue_cutoff, and
pg_qvalue_cutoff respectively (only drop rows that exceed the provided cutoffs),
then pass the filtered data into MSstatsClean(input, MBR, quantificationColumn)
and continue to .writeChunkToFile; if you prefer not to implement filtering now,
remove those cutoff parameters from cleanDIANNChunk’s signature to avoid
misleading users.
🧹 Nitpick comments (3)
R/utils.R (1)

8-17: Clean helper; minor simplification possible.

The logic is correct. You could collapse the inner branch into a single call since the only difference is the append flag.

Optional simplification
 .writeChunkToFile <- function(input, output_path, pos) {
-    # Write to file
-    if (!is.null(pos)) {
-        if (pos == 1) {
-            readr::write_csv(input, file = output_path, append = FALSE)
-        } else {
-            readr::write_csv(input, file = output_path, append = TRUE)
-        }
-    }
+    if (!is.null(pos)) {
+        readr::write_csv(input, file = output_path, append = (pos != 1))
+    }
 }
tests/testthat/test-diann_converter.R (1)

144-175: End-to-end test cleanup mirrors production's paste0 path issue.

Line 174 uses the same paste0("reduce_output_", output_file) pattern as production code. This works here because output_file is a bare filename (line 146: basename(tempfile(...))), but it won't catch the path-concatenation bug flagged in R/converters.R. Consider adding a test case where output_file_name contains a directory component to guard against regressions.

R/clean_DIANN.R (1)

44-44: @importFrom placed on internal function instead of package-level collation.

The @importFrom MSstatsConvert MSstatsImport MSstatsClean directive here works, but it's common in R packages to consolidate namespace imports in a package-doc file (e.g., R/{package}-package.R) for easier maintenance. This is a minor style point — no functional impact.

Comment thread man/bigDIANNtoMSstatsFormat.Rd
Comment thread R/clean_DIANN.R
@tonywu1999 tonywu1999 merged commit 2be67e0 into devel Feb 6, 2026
1 check passed
@tonywu1999 tonywu1999 deleted the refactor-clean branch February 6, 2026 17:57
tonywu1999 added a commit that referenced this pull request Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant