refactor(diann): Integrate MSstatsImport and MSstatsClean functions into DIANN converter#10
refactor(diann): Integrate MSstatsImport and MSstatsClean functions into DIANN converter#10tonywu1999 merged 3 commits intodevelfrom
Conversation
📝 WalkthroughWalkthroughThe DIANN cleaning pipeline now delegates per-chunk conversion and cleaning to MSstatsConvert's MSstatsImport and MSstatsClean. A generic chunk-writing helper was added and used by DIANN and Spectronaut cleaners. The default DIANN quantification column was changed to "FragmentQuantCorrected". Documentation and tests were updated. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant reduceBigDIANN as reduceBigDIANN
participant cleanDIANNChunk as cleanDIANNChunk
participant MSstatsImport as MSstatsImport\n(MSstatsConvert)
participant MSstatsClean as MSstatsClean\n(MSstatsConvert)
participant FileIO as .writeChunkToFile
User->>reduceBigDIANN: provide input_file, output_path, pos chunks
reduceBigDIANN->>cleanDIANNChunk: provide chunk data
cleanDIANNChunk->>MSstatsImport: MSstatsImport(chunk)
MSstatsImport-->>cleanDIANNChunk: imported MSstats-format data
cleanDIANNChunk->>MSstatsClean: MSstatsClean(imported data)
MSstatsClean-->>cleanDIANNChunk: cleaned data
cleanDIANNChunk->>FileIO: .writeChunkToFile(cleaned, output_path, pos)
FileIO-->>cleanDIANNChunk: NULL (written)
cleanDIANNChunk-->>reduceBigDIANN: chunk processed
reduceBigDIANN-->>User: processing complete (file written)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
R/converters.R (2)
162-163:⚠️ Potential issue | 🟡 MinorDocumentation still references the old default value.
The
@paramforquantificationColumnsays'Fragment.Quant.Corrected'(default)but the actual default on line 174 is now"FragmentQuantCorrected". Update the doc to match.Proposed fix
-#' `@param` quantificationColumn Use 'Fragment.Quant.Corrected'(default) column for quantified intensities for DIANN 1.8.x. -#' Use 'FragmentQuantRaw' for quantified intensities for DIANN 1.9.x. +#' `@param` quantificationColumn Use 'FragmentQuantCorrected'(default) column for quantified intensities. +#' Use 'FragmentQuantRaw' for quantified intensities for DIANN 1.9.x.
185-196: 🛠️ Refactor suggestion | 🟠 MajorUse
file.path()for intermediate file paths to handle directory-containingoutput_file_name.When
output_file_namecontains directory components (e.g.,"results/out.csv"),paste0("reduce_output_", output_file_name)produces"reduce_output_results/out.csv"— a broken path. Usefile.path(dirname(...), paste0("reduce_output_", basename(...)))to place the intermediate file alongside the final output.Proposed fix
+ reduce_output_path <- file.path(dirname(output_file_name), + paste0("reduce_output_", basename(output_file_name))) + # Reduce and clean the DIANN report file in chunks reduceBigDIANN(input_file, - paste0("reduce_output_", output_file_name), + reduce_output_path, MBR, quantificationColumn) # Preprocess the cleaned data (feature selection, etc.) msstats_data <- MSstatsPreprocessBig( - paste0("reduce_output_", output_file_name), + reduce_output_path, output_file_name, backend, max_feature_count, filter_unique_peptides, aggregate_psms, filter_few_obs, remove_annotation, calculateAnomalyScores, anomalyModelFeatures, connection)Based on learnings: "In R converters across the MSstatsBig package, replace direct concatenation with file names by constructing intermediate file paths using file.path(dirname(output_file_name), paste0("prefix_", basename(output_file_name))). This ensures the intermediate file is created in the correct directory even when output_file_name contains directories."
🤖 Fix all issues with AI agents
In `@man/bigDIANNtoMSstatsFormat.Rd`:
- Line 12: Update the roxygen `@param` description for quantificationColumn to
match the function default: change the referenced default from
'Fragment.Quant.Corrected' to 'FragmentQuantCorrected' so the documentation
aligns with the actual default in the converter function (see R/converters.R and
the function signature where quantificationColumn = "FragmentQuantCorrected").
In `@R/clean_DIANN.R`:
- Around line 51-58: cleanDIANNChunk declares global_qvalue_cutoff,
qvalue_cutoff, and pg_qvalue_cutoff but never applies them; update the function
to mirror cleanSpectronaut: before calling MSstatsClean, perform row-level
filtering on the DIANN input data using the Q.Value, Lib.Q.Value, and
Lib.PG.Q.Value columns according to qvalue_cutoff, global_qvalue_cutoff, and
pg_qvalue_cutoff respectively (only drop rows that exceed the provided cutoffs),
then pass the filtered data into MSstatsClean(input, MBR, quantificationColumn)
and continue to .writeChunkToFile; if you prefer not to implement filtering now,
remove those cutoff parameters from cleanDIANNChunk’s signature to avoid
misleading users.
🧹 Nitpick comments (3)
R/utils.R (1)
8-17: Clean helper; minor simplification possible.The logic is correct. You could collapse the inner branch into a single call since the only difference is the
appendflag.Optional simplification
.writeChunkToFile <- function(input, output_path, pos) { - # Write to file - if (!is.null(pos)) { - if (pos == 1) { - readr::write_csv(input, file = output_path, append = FALSE) - } else { - readr::write_csv(input, file = output_path, append = TRUE) - } - } + if (!is.null(pos)) { + readr::write_csv(input, file = output_path, append = (pos != 1)) + } }tests/testthat/test-diann_converter.R (1)
144-175: End-to-end test cleanup mirrors production'spaste0path issue.Line 174 uses the same
paste0("reduce_output_", output_file)pattern as production code. This works here becauseoutput_fileis a bare filename (line 146:basename(tempfile(...))), but it won't catch the path-concatenation bug flagged inR/converters.R. Consider adding a test case whereoutput_file_namecontains a directory component to guard against regressions.R/clean_DIANN.R (1)
44-44:@importFromplaced on internal function instead of package-level collation.The
@importFrom MSstatsConvert MSstatsImport MSstatsCleandirective here works, but it's common in R packages to consolidate namespace imports in a package-doc file (e.g.,R/{package}-package.R) for easier maintenance. This is a minor style point — no functional impact.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests