Skip to content

Handle big spec file#144

Merged
tonywu1999 merged 6 commits intodevelfrom
Handle-big-spec-file
Jan 28, 2026
Merged

Handle big spec file#144
tonywu1999 merged 6 commits intodevelfrom
Handle-big-spec-file

Conversation

@Rudhik1904
Copy link
Copy Markdown
Contributor

@Rudhik1904 Rudhik1904 commented Dec 28, 2025

Overview
This PR adds support for processing large Spectronaut datasets that exceed available RAM by integrating MSstatsBig functionality. Users can now enable large dataset processing via a checkbox option.

Added checkbox UI element to enable MSstatsBig processing for Spectronaut data

Updated pre-processing functions to conditionally route to appropriate converter based on checkbox state.

Removed annotation file upload/selection from Spectronaut large dataset workflow

Added exception handling for datasets where even compressed/reduced data exceeds RAM

Summary by CodeRabbit

  • New Features

    • Large-Spectronaut-file workflow: local file browser, big-file mode toggle, and conditional UI for web vs non-web deployments.
    • New large-file controls: q-value filter, max-feature limit, aggregation and quantification/exclusion options, plus processing progress indicator.
  • Documentation

    • Runtime docs updated to document the new environment-aware parameter for load handling.
  • Tests

    • Added unit and UI tests covering big-file selection, validation, processing, and error cases.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 28, 2025

📝 Walkthrough

Walkthrough

Adds a large-Spectronaut-file workflow: new UI controls and local-file browser, MSstatsBig-based conversion path in server utils, an is_web_server flag for conditional behavior, updated DESCRIPTION dependencies, and tests/documentation updates.

Changes

Cohort / File(s) Summary
Package dependencies
DESCRIPTION
Added shinyFiles and MSstatsBig imports.
Loadpage server (big-file handling)
R/module-loadpage-server.R, R/utils.R
Added is_web_server param to loadpageServer; introduced shinyFiles local browser/reactives, local_big_file_path, big-file validation flow, MSstatsBig::bigSpectronauttoMSstatsFormat conversion path with error handling and notifications; preserved existing standard/spec workflows.
Loadpage UI (spectronaut big-file UI)
R/module-loadpage-ui.R
Reworked Spectronaut upload UI into composable uiOutputs and helper UI builders; added big_file_spec toggle, local browse controls, big-file-specific filters/options, and conditional gating for annotation/preprocessing.
Server invocation & docs
R/server.R, man/loadpageServer.Rd
Updated loadpageServer call to pass is_web_server = isWebServer; documented new optional is_web_server argument.
Tests
tests/testthat/test-loadpage-server.R, tests/testthat/test-module-loadpage-ui.R, tests/testthat/test-utils.R
Added tests for .get_data_source_type, Spectronaut UI helper outputs, and big-file processing (valid/invalid inputs, file-not-found, memory error), stubbing file system and MSstatsBig calls.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant UI as Loadpage UI
    participant FileBrowser as shinyFiles Browser
    participant Server as Loadpage Server
    participant MSB as MSstatsBig
    participant Results

    User->>UI: Select filetype = "spec"
    UI->>UI: Show big-file toggle (big_file_spec)

    alt big_file_spec = TRUE
        User->>UI: Click browse
        UI->>FileBrowser: Open local file chooser
        FileBrowser-->>UI: Return local path
        UI->>Server: Proceed with local path & options
        Server->>Server: .get_data_source_type("spec", TRUE) => "big_spectronaut"
        Server->>MSB: bigSpectronauttoMSstatsFormat(local_path, filters)
        MSB-->>Server: Return converted data or error
        Server->>Results: Store/display converted data (or show error)
    else standard path
        User->>UI: Upload/select standard spec
        UI->>Server: Proceed without local path
        Server->>Server: .get_data_source_type(...) => "standard"
        Server->>Server: Existing getData() conversion
        Server->>Results: Store/display data
    end

    Results->>User: Present feedback/results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related issues

Suggested labels

Review effort 3/5

Poem

🐰 I hopped through code at break of dawn,
I added toggles, browse and brawn,
MSstatsBig munched the giant file,
Small paws, big data, bunny smile!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Handle big spec file' is partially related to the main changeset, referring to a real aspect of the changes (big/large Spectronaut spec file handling), but is vague and doesn't fully capture the comprehensive scope of the PR (MSstatsBig integration, UI additions, conditional routing logic). Consider a more descriptive title that captures the main intent, such as 'Add MSstatsBig support for large Spectronaut datasets' or 'Integrate large-file Spectronaut processing with conditional UI'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
R/module-loadpage-server.R (1)

338-350: Consider adding server-side validation for processing parameters.

While the UI provides default values for parameters like qvalue_cutoff and max_feature_count, adding server-side validation would make the code more robust against edge cases or direct API calls.

Example validation
+        # Validate input parameters
+        qvalue_cutoff <- input$qvalue_cutoff
+        if (is.null(qvalue_cutoff) || qvalue_cutoff < 0 || qvalue_cutoff > 1) {
+          qvalue_cutoff <- 0.01
+        }
+        
+        max_feature_count <- input$max_feature_count
+        if (is.null(max_feature_count) || max_feature_count < 1) {
+          max_feature_count <- 20
+        }
+        
         # Call the big file conversion function from MSstatsConvert
         converted_data <- MSstatsBig::bigSpectronauttoMSstatsFormat(
           input_file = local_path_spec(),
           output_file_name = "output_file.csv",
           backend = "arrow",
           filter_by_excluded = input$filter_by_excluded,
           filter_by_identified = input$filter_by_identified,
           filter_by_qvalue = input$filter_by_qvalue,
-          qvalue_cutoff = input$qvalue_cutoff,
-          max_feature_count = input$max_feature_count,
+          qvalue_cutoff = qvalue_cutoff,
+          max_feature_count = max_feature_count,
           filter_unique_peptides = input$filter_unique_peptides,
           aggregate_psms = input$aggregate_psms,
           filter_few_obs = input$filter_few_obs
         )
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64249d3 and e07c645.

📒 Files selected for processing (7)
  • DESCRIPTION
  • R/module-loadpage-server.R
  • R/module-loadpage-ui.R
  • R/server.R
  • man/loadpageServer.Rd
  • tests/testthat/test-loadpage-server.R
  • tests/testthat/test-module-loadpage-ui.R
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (12)
R/module-loadpage-ui.R (3)

226-226: LGTM! Annotation upload correctly gated for big Spectronaut files.

The conditional logic properly excludes annotation file upload when the big-file Spectronaut path is selected, which aligns with the PR objectives.


288-320: LGTM! Well-structured UI for large Spectronaut file workflow.

The new UI flow properly supports both standard file input and local file browsing with appropriate processing options. All input IDs are correctly namespaced.


518-525: LGTM! Pre-processing options correctly hidden for big Spectronaut files.

The conditional logic properly hides standard pre-processing options when using the large-file Spectronaut path, as those options are now provided in the big-file-specific UI section.

R/server.R (1)

35-35: LGTM! Environment flag correctly passed to loadpageServer.

The is_web_server parameter is properly passed to enable environment-specific behavior in the loadpage module.

man/loadpageServer.Rd (1)

7-14: LGTM! Documentation correctly reflects the new parameter.

The is_web_server parameter is properly documented with its default value and description.

tests/testthat/test-loadpage-server.R (1)

1-21: LGTM! Comprehensive test coverage for data source type helper.

The tests validate all relevant scenarios including edge cases (NULL, non-spec file types). This provides good confidence in the routing logic.

tests/testthat/test-module-loadpage-ui.R (1)

397-436: LGTM! Thorough UI testing for the big Spectronaut file workflow.

The tests verify all key UI elements and conditional logic for the new large-file processing path. Good attention to detail with HTML entity encoding checks.

R/module-loadpage-server.R (4)

1-13: LGTM! Clean and safe data source type resolver.

The helper function correctly uses isTRUE() for safe boolean checking, which handles NULL and NA cases gracefully.


33-62: LGTM! Proper isolation of local file browser logic.

The shinyFiles integration is correctly gated behind the is_web_server check, ensuring local file browsing is only available when running locally. The reactive path handling is clean and safe.


200-204: LGTM! Proceed button logic correctly handles both file upload paths.

The conditional logic properly distinguishes between standard Spectronaut file uploads and the new large-file browser workflow.


354-366: LGTM! Proper error handling for memory allocation failures.

The error handling correctly catches memory errors, provides user feedback, and prevents further processing. The notification includes technical error details which are helpful for troubleshooting.

DESCRIPTION (1)

27-28: Both shinyFiles and MSstatsBig are confirmed as actively used and available from reputable sources.

shinyFiles (0.9.3) is available on CRAN and is used for file browsing functionality in the load page module. MSstatsBig is available on Bioconductor and is used for the Spectronaut data conversion function. Both packages integrate properly with the codebase and are compatible with the required R version.

Comment thread R/module-loadpage-server.R Outdated
Comment thread R/module-loadpage-server.R Outdated
Comment thread R/module-loadpage-server.R Outdated
Comment thread R/module-loadpage-server.R Outdated
Comment thread R/module-loadpage-server.R Outdated
Comment thread R/module-loadpage-server.R Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In @R/module-loadpage-server.R:
- Around line 297-309: Before calling MSstatsBig::bigSpectronauttoMSstatsFormat
to produce converted_data, validate inputs: ensure local_big_file_path() exists
and is readable, input$qvalue_cutoff is numeric and 0 <= value <= 1,
input$max_feature_count is numeric and > 0, and required boolean flags like
input$filter_by_excluded/input$filter_by_identified are non-NULL; if any check
fails, stop the call and return a clear error message to the user (or set a
reactive error state) so the function is only invoked with validated values.
- Around line 291-299: Replace the hardcoded "output_file.csv" with a unique,
session-scoped temporary filename and ensure it is removed after processing:
generate a temp path (e.g., via tempfile() or by combining a session ID/UUID
with a .csv extension) and pass that path into
MSstatsBig::bigSpectronauttoMSstatsFormat (where local_big_file_path() is the
input), then wrap the call and downstream use of converted_data in a
tryCatch/on.exit block to unlink() the temp file regardless of success or error
so concurrent users don't collide and files are cleaned up.
🧹 Nitpick comments (3)
R/module-loadpage-ui.R (1)

288-337: Consider extracting parameter definitions to reduce duplication.

The filtering and processing options (lines 313-323) are hardcoded in the UI function. If these parameters need to be referenced elsewhere (e.g., for validation or defaults), consider extracting them into a shared configuration object or constant.

♻️ Optional refactor to centralize parameter definitions

Create a configuration list that can be reused:

# At the top of the file or in a separate config file
SPECTRONAUT_BIG_FILE_DEFAULTS <- list(
  filter_by_excluded = FALSE,
  filter_by_identified = FALSE,
  filter_by_qvalue = TRUE,
  qvalue_cutoff = 0.01,
  max_feature_count = 20,
  filter_unique_peptides = FALSE,
  aggregate_psms = FALSE,
  filter_few_obs = FALSE
)

This would make it easier to maintain consistency between UI defaults and server-side processing.

R/module-loadpage-server.R (2)

30-30: Document the is_web_server parameter behavior.

While the parameter has a sensible default, the function documentation should clarify what constitutes a "web server" context and how it affects the module's behavior (e.g., disabling local file browsing).


313-323: Improve error message clarity for users.

The error message on line 318 exposes the raw error message from R, which may be technical and confusing for end users. Consider providing more actionable guidance.

💬 Provide user-friendly error messages
         converted_data <- tryCatch({
           dplyr::collect(converted_data)
         }, error = function(e) {
           shinybusy::remove_modal_spinner()
+          # Determine if this is a memory error or something else
+          is_memory_error <- grepl("allocate|memory|cannot allocate vector", e$message, ignore.case = TRUE)
+          
+          if (is_memory_error) {
+            showNotification(
+              "Memory Error: The dataset is too large to load into memory after conversion. Consider:\n1. Running on a machine with more RAM\n2. Further reducing the dataset using the filtering options\n3. Processing the data in chunks using MSstatsBig directly",
+              type = "error",
+              duration = NULL
+            )
+          } else {
             showNotification(
-              paste("Memory Error: The dataset is too large to process in-memory.", e$message),
+              paste("Error processing data:", e$message),
               type = "error",
               duration = NULL
             )
+          }
           return(NULL)
         })
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e07c645 and 983265e.

📒 Files selected for processing (2)
  • R/module-loadpage-server.R
  • R/module-loadpage-ui.R
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (5)
R/module-loadpage-ui.R (2)

535-535: Gating logic correctly hides incompatible options.

The processing options are appropriately hidden when big_file_spec is active, preventing users from selecting options that don't apply to the MSstatsBig workflow. This maintains UI consistency with the dual-path approach.

Also applies to: 542-542


226-226: Verify that MSstats scheme output from Spectronaut contains all necessary metadata.

The annotation file upload is hidden when big_file_spec is checked, and the bigSpectronauttoMSstatsFormat() function does not accept an annotation parameter. Ensure the Spectronaut MSstats scheme output format includes run-to-sample mapping (conditions, bioreplicates) embedded in the file, as this information would normally come from a separate annotation file in standard workflows.

R/module-loadpage-server.R (3)

1-13: LGTM - Clear routing logic.

The .get_data_source_type helper function provides clean separation between standard and big-file processing paths with appropriate defensive checking via isTRUE().


159-161: Enable button logic correctly handles both file input modes.

The proceed button is properly enabled for both standard file upload (spec_regular_file_ok) and local file browsing (spec_big_file_ok) modes, ensuring users can only proceed when a valid file is selected.


297-309: MSstatsBig API compatibility verified.

The function MSstatsBig::bigSpectronauttoMSstatsFormat exists with all parameters used in the code (input_file, output_file_name, backend, filter_by_excluded, filter_by_identified, filter_by_qvalue, qvalue_cutoff, max_feature_count, filter_unique_peptides, aggregate_psms, filter_few_obs) confirmed as valid parameter names. The backend = "arrow" option is supported. MSstatsBig is properly declared as a dependency in DESCRIPTION.

Comment thread R/module-loadpage-server.R
Comment thread R/module-loadpage-server.R Outdated
Comment thread R/module-loadpage-server.R Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@R/utils.R`:
- Around line 530-545: Wrap the MSstatsBig::bigSpectronauttoMSstatsFormat call
in tryCatch so any error is handled: in the error handler call
shinybusy::remove_modal_spinner(), show a user-facing notification (e.g.,
shiny::showNotification with type="error") that includes the error message, and
return early (e.g., return(NULL)) so downstream code using converted_data is not
executed; locate the call to bigSpectronauttoMSstatsFormat and replace it with
the tryCatch-wrapped version that references local_big_file_path and assigns to
converted_data on success.
♻️ Duplicate comments (1)
R/utils.R (1)

533-536: Avoid hardcoded output_file_name; use a unique temp path + cleanup.

Line 535 writes to a fixed output_file.csv, which can be overwritten across concurrent sessions and left behind on disk. Use a unique temp path and unlink after processing.

🔧 Suggested fix
+        output_path <- tempfile("spectronaut_", fileext = ".csv")
+        on.exit({
+          if (file.exists(output_path)) unlink(output_path)
+        }, add = TRUE)
-        converted_data <- MSstatsBig::bigSpectronauttoMSstatsFormat(
+        converted_data <- MSstatsBig::bigSpectronauttoMSstatsFormat(
           input_file = local_big_file_path,
-          output_file_name = "output_file.csv",
+          output_file_name = output_path,
           backend = "arrow",
           filter_by_excluded = input$filter_by_excluded,
           filter_by_identified = input$filter_by_identified,
           filter_by_qvalue = input$filter_by_qvalue,
           qvalue_cutoff = input$qvalue_cutoff,
           max_feature_count = input$max_feature_count,
           filter_unique_peptides = input$filter_unique_peptides,
           aggregate_psms = input$aggregate_psms,
           filter_few_obs = input$filter_few_obs
         )
🧹 Nitpick comments (1)
R/module-loadpage-server.R (1)

19-48: Confirm is_web_server is set correctly to avoid exposing server files.

Lines 21‑48 enable shinyFiles browsing when is_web_server is FALSE. If a hosted deployment passes FALSE, users could browse the server filesystem. Please verify the deployment config always sets is_web_server = TRUE for hosted instances.

Comment thread R/utils.R
@tonywu1999 tonywu1999 merged commit e8951f9 into devel Jan 28, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the Handle-big-spec-file branch January 28, 2026 20:53
@coderabbitai coderabbitai Bot mentioned this pull request Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants