refactor(statsmodel): Refactor statsmodel server into sub-functions by tonywu1999 · Pull Request #119 · Vitek-Lab/MSstatsShiny

tonywu1999 · 2025-11-02T20:07:22Z

Summary by CodeRabbit

Refactor
- Server and UI reorganized into modular components for contrast construction, plotting, downloads, and results rendering for clearer wiring and maintainability.
New Features / UI
- Control renamed to "Contrast mode" (options preserved); contrast input panels and plotting UI modularized.
- Added unified results table, downloadable analysis/code outputs, and modal spinners during plotting.
Quality
- Centralized validation, error handling, and state management to prevent UI crashes and standardize outputs.
Tests
- Expanded tests for condition extraction, contrast builders, result filtering, and UI behavior.

coderabbitai · 2025-11-02T20:07:31Z

Walkthrough

Refactors statmodel server by extracting contrast-matrix builders, plotting and download helpers, centralizing reactives/validation, and renames the UI contrast input from def_comp to contrast_mode; updates tests to use condition_list() and the new input IDs.

Changes

Cohort / File(s)	Change Summary
Server refactor & helpers `R/module-statmodel-server.R`	Large refactor: added contrast utilities (`get_experimental_conditions`, `validate_contrast_inputs`, `build_custom_pairwise_contrast`, `build_custom_non_pairwise_contrast`, `build_all_against_one_contrast`, `build_all_pair_contrast`), plotting/UI helpers (`render_*`, `render_group_comparison_plot_inputs`, `create_group_comparison_plot`), analysis/export wiring (`generate_analysis_code`, `create_download_handlers`, `render_results_table`); introduced reactives (`condition_list`, `contrast`, `comp_list`, `matrix_build`); reorganized `statmodelServer` internals and replaced inline UI/matrix logic with modular helpers.
UI — contrast input rename `R/statmodel-ui-options-contrasts.R`	Replaced input id `def_comp` with `contrast_mode` across `radioButtons` and all `conditionalPanel` conditions; preserved option values (`all_pair`, `all_one`, `custom`, `custom_np`).
Unit tests — server & utils `tests/testthat/test-module-statmodel-server.R`, `tests/testthat/test-utils-statmodel-server.R`	Tests updated to assert `condition_list()` instead of `choices()`; replaced `def_comp` → `contrast_mode` in inputs; added extensive tests for contrast builders and significant-protein extraction; removed a `round_df` test block.
Unit tests — UI `tests/testthat/test-module-statmodel-ui.R`	Updated UI tests to expect `contrast_mode` IDs and adjusted conditional-panel and namespace expectations accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant User as User / UI
    participant Server as statmodelServer
    participant Validator as validate_contrast_inputs
    participant Builder as matrix_build → build_*
    participant Plot as create_group_comparison_plot
    participant Results as render_results_table / create_download_handlers

    Note over User,Server: User sets conditions and chooses contrast_mode
    User->>Server: set inputs (conditions, contrast_mode)
    Server->>Validator: validate_contrast_inputs(input, contrast_mode, condition_list)
    alt valid
        Validator->>Builder: matrix_build delegates to appropriate build_* (custom, custom_np, all_one, all_pair)
        Builder->>Plot: provide contrast matrix + selected row/data
        Plot->>Results: compute results, render table, enable downloads
        Results->>User: display table, code, plots, and download links
    else invalid
        Validator->>Server: return validation error
        Server->>User: show validation messages / reset matrix state
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas needing extra attention:
- Correctness and parity of build_* contrast builders versus previous behavior
- Reactive lifecycle and invalidation for condition_list, matrix_build, comp_list
- Validation paths in validate_contrast_inputs and how errors reset/affect state
- Download/code-generation handlers (generate_analysis_code, create_download_handlers) for reproducibility and edge cases

Possibly related PRs

tests(stat-model): Add unit tests to stat model server #118 — overlaps R/module-statmodel-server.R changes and contrast/matrix logic.
refactor(statsmodel): Refactor statsmodel UI components to be more readable #117 — related UI refactor that renamed def_comp and reorganized contrast panels.

Suggested labels

Review effort 3/5

Poem

🐰 I hopped through code with tidy paws,

Split contrasts, tests, and UI laws.
Conditions lined in neat array,
Builders hum and plots display,
Downloads ready — carrots, hooray! 🎋

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: refactoring the statsmodel server into sub-functions, which aligns with the comprehensive modularization introducing 14+ new public helper functions and reorganizing server logic.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch refactor-statmodel-server

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-11-02T20:08:04Z

Failed to generate code suggestions for PR

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9ec13d and e14d4a7.

📒 Files selected for processing (1)

R/module-statmodel-server.R (2 hunks)

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 33bc5e1 and c3b515f.

📒 Files selected for processing (1)

R/module-statmodel-server.R (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

R/module-statmodel-server.R (2)

154-177: Reintroduce the rowname reset for all-pair contrasts.

Hitting Submit twice in “All pair” mode still blows up with length of 'dimnames' [1] not equal to array extent because comp_list$dList is never cleared before rebuilding. Please wipe it (and rebuild from a local label vector) alongside the matrix so reruns stay aligned.

 build_all_pair_contrast = function(input, condition_list, contrast, comp_list, row, loadpage_input) {
   contrast$matrix = NULL
-  
+  comp_list$dList = NULL
+  labels = character(0)
   for (index in 1:length(condition_list)) {
     for (index1 in 1:length(condition_list)) {
       if (index == index1) next
       if (index < index1) {
-        comp_list$dList = c(isolate(comp_list$dList),
-                            paste(condition_list[index], "vs", condition_list[index1], sep = " "))
+        labels = c(labels,
+                   paste(condition_list[index], "vs", condition_list[index1], sep = " "))
@@
-        
-        rownames(contrast$matrix) = comp_list$dList
-        colnames(contrast$matrix) = condition_list
       }
     }
   }
-  
+  comp_list$dList = labels
+  rownames(contrast$matrix) = comp_list$dList
+  colnames(contrast$matrix) = condition_list
   return(contrast$matrix)
 }

268-281: Flatten PTM significant-protein lists before returning.

extract_significant_proteins() still hands PTM callers a list of three data frames, but renderDataTable({ SignificantProteins() }) and the CSV download both expect a single rectangular object. That reproduces the DT/render/download crashes we flagged earlier. Convert the PTM list into one data frame (e.g., dplyr::bind_rows(sig_list, .id = "Model")) before returning so the downstream renderers keep working for PTM users.

   if (loadpage_input$BIO == "PTM") {
-    sig_unadj = data_comp$PTM.Model[data_comp$PTM.Model$adj.pvalue < signif_threshold,]
-    sig_prot = data_comp$PROTEIN.Model[data_comp$PROTEIN.Model$adj.pvalue < signif_threshold,]
-    sig_adj = data_comp$ADJUSTED.Model[data_comp$ADJUSTED.Model$adj.pvalue < signif_threshold,]
-    
-    list(PTM.Model = sig_unadj,
-         PROTEIN.Model = sig_prot,
-         ADJUSTED.Model = sig_adj)
+    sig_list = list(
+      PTM = data_comp$PTM.Model[data_comp$PTM.Model$adj.pvalue < signif_threshold,],
+      PROTEIN = data_comp$PROTEIN.Model[data_comp$PROTEIN.Model$adj.pvalue < signif_threshold,],
+      ADJUSTED = data_comp$ADJUSTED.Model[data_comp$ADJUSTED.Model$adj.pvalue < signif_threshold,]
+    )
+    sig_list = Filter(nrow, sig_list)
+    if (!length(sig_list)) return(NULL)
+    dplyr::bind_rows(sig_list, .id = "Model")
   } else if (loadpage_input$DDA_DIA == "TMT") {

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3b515f and 96d5406.

📒 Files selected for processing (1)

R/module-statmodel-server.R (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (3)

R/module-statmodel-server.R (3)
130-154: Reset comp_list$dList to prevent label accumulation.

Submitting "All against one" multiple times without clearing still accumulates labels in comp_list$dList (line 136-137), causing a rownames length mismatch when assigning at line 150. The function must reset comp_list$dList before the loop.

Apply this diff:
 build_all_against_one_contrast = function(input, condition_list, contrast, comp_list, row, loadpage_input) {
+  contrast$matrix = NULL
+  comp_list$dList = NULL
+  
   index3 = which(condition_list == input$group3)
   
   for (index in 1:length(condition_list)) {
     if (index == index3) next
     
-    comp_list$dList = c(isolate(comp_list$dList),
+    comp_list$dList = c(comp_list$dList,
                         paste(condition_list[index], "vs", input$group3, sep = " "))
156-184: Reset comp_list$dList alongside contrast$matrix.

The function resets contrast$matrix (line 157) but leaves comp_list$dList intact. Repeated "All pair" submissions append new labels to the existing vector (line 163-164), so the rownames assignment at line 177 fails with a length mismatch.

Apply this diff:
 build_all_pair_contrast = function(input, condition_list, contrast, comp_list, row, loadpage_input) {
   contrast$matrix = NULL
+  comp_list$dList = NULL
   
   for (index in 1:length(condition_list)) {
275-290: Handle PTM list-of-dataframes structure before rendering/downloading.

extract_significant_proteins() returns a list of three dataframes for PTM runs (lines 282-284) but a single dataframe otherwise. The download handler (line 373) and table renderer (lines 399-400) assume a single rectangular dataframe, so PTM runs will fail with coercion errors.

Branch on loadpage_input()$BIO == "PTM" or is.list(SignificantProteins()) and either:

Flatten the list into one dataframe (bind rows, add a source column), or

Expose a selector to pick which list element to show/download

Update output$significant, output$number, and output$download_signif to handle the chosen/flattened dataframe.

Also applies to: 373-373, 399-400

🧹 Nitpick comments (3)

R/module-statmodel-server.R (3)

5-18: Simplify redundant condition checks.

The nested conditions redundantly check loadpage_input$BIO == "PTM" multiple times within branches that already verify this condition.

Apply this diff to remove the redundant checks:

 get_experimental_conditions = function(loadpage_input, preprocess_data) {
-  if (loadpage_input$BIO == "PTM" & 
-      ((loadpage_input$BIO == "PTM" & loadpage_input$DDA_DIA == "TMT") | 
-       loadpage_input$filetype == 'phil')) {
+  if (loadpage_input$BIO == "PTM" & 
+      (loadpage_input$DDA_DIA == "TMT" | loadpage_input$filetype == 'phil')) {
     levels(preprocess_data$PTM$ProteinLevelData$Condition)
-  } else if (loadpage_input$BIO == "PTM" & 
-             (loadpage_input$BIO == "PTM" & loadpage_input$DDA_DIA != "TMT")) {
+  } else if (loadpage_input$BIO == "PTM" & loadpage_input$DDA_DIA != "TMT") {
     levels(preprocess_data$PTM$ProteinLevelData$GROUP)
   } else if (loadpage_input$DDA_DIA == "TMT") {
     levels(preprocess_data$ProteinLevelData$Condition)
   } else {
     levels(preprocess_data$ProteinLevelData$GROUP)
   }
 }

92-93: Remove unnecessary rbind wrapper.

Line 93 wraps the deduplicated matrix in an unnecessary outer rbind() call.

Apply this diff:

   } else {
     contrast$matrix = rbind(contrast$matrix, contrast$row)
-    contrast$matrix = rbind(contrast$matrix[!duplicated(contrast$matrix),])
+    contrast$matrix = contrast$matrix[!duplicated(contrast$matrix), , drop = FALSE]
   }

121-121: Remove unnecessary rbind wrapper.

Same as the previous function, line 121 wraps the deduplicated matrix unnecessarily.

Apply this diff:

   } else {
     contrast$matrix = rbind(contrast$matrix, contrast$row)
-    contrast$matrix = rbind(contrast$matrix[!duplicated(contrast$matrix),])
+    contrast$matrix = contrast$matrix[!duplicated(contrast$matrix), , drop = FALSE]
   }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ac2a44 and 6072bb9.

📒 Files selected for processing (1)

R/module-statmodel-server.R (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

R/module-statmodel-server.R (1)

130-154: Reset contrast state before rebuilding all-against-one contrasts.

This function still accumulates labels across multiple submissions. When a user clicks submit twice in "all_one" mode without clearing, comp_list$dList retains the previous labels and appends new ones, causing a row name length mismatch with the contrast matrix.

Apply this diff to reset state and build labels locally:

 build_all_against_one_contrast = function(input, condition_list, contrast, comp_list, row, loadpage_input) {
+  contrast$matrix = NULL
+  comp_list$dList = NULL
+  labels = character(0)
+  
   index3 = which(condition_list == input$group3)
   
   for (index in 1:length(condition_list)) {
     if (index == index3) next
     
-    comp_list$dList = c(isolate(comp_list$dList),
-                        paste(condition_list[index], "vs", input$group3, sep = " "))
+    labels = c(labels,
+               paste(condition_list[index], "vs", input$group3, sep = " "))
     
     contrast$row = matrix(row, nrow = 1)
     contrast$row[index] = 1
     contrast$row[index3] = -1
     
     if (is.null(contrast$matrix)) {
       contrast$matrix = contrast$row
     } else {
       contrast$matrix = rbind(contrast$matrix, contrast$row)
     }
   }
   
+  comp_list$dList = labels
   rownames(contrast$matrix) = comp_list$dList
   colnames(contrast$matrix) = condition_list
   
   return(contrast$matrix)
 }

🧹 Nitpick comments (3)

R/module-statmodel-server.R (3)

5-18: Simplify redundant condition checks.

The function contains redundant checks for loadpage_input$BIO == "PTM". When the outer condition already verifies BIO == "PTM", the inner conditions don't need to repeat this check.

Apply this diff to simplify the logic:

 get_experimental_conditions = function(loadpage_input, preprocess_data) {
-  if (loadpage_input$BIO == "PTM" & 
-      ((loadpage_input$BIO == "PTM" & loadpage_input$DDA_DIA == "TMT") | 
-       loadpage_input$filetype == 'phil')) {
+  if (loadpage_input$BIO == "PTM" & 
+      (loadpage_input$DDA_DIA == "TMT" | loadpage_input$filetype == 'phil')) {
     levels(preprocess_data$PTM$ProteinLevelData$Condition)
-  } else if (loadpage_input$BIO == "PTM" & 
-             (loadpage_input$BIO == "PTM" & loadpage_input$DDA_DIA != "TMT")) {
+  } else if (loadpage_input$BIO == "PTM" & loadpage_input$DDA_DIA != "TMT") {
     levels(preprocess_data$PTM$ProteinLevelData$GROUP)
   } else if (loadpage_input$DDA_DIA == "TMT") {
     levels(preprocess_data$ProteinLevelData$Condition)
   } else {
     levels(preprocess_data$ProteinLevelData$GROUP)
   }
 }

156-185: Consider moving rownames assignment outside the nested loops.

While the logic is correct, setting rownames() and colnames() inside the inner loop (line 178-179) is inefficient and gets repeated for every iteration. These assignments only need to happen once after the matrix is fully built.

Apply this diff to improve efficiency:

 build_all_pair_contrast = function(input, condition_list, contrast, comp_list, row, loadpage_input) {
   contrast$matrix = NULL
   comp_list$dList = NULL
   
   for (index in 1:length(condition_list)) {
     for (index1 in 1:length(condition_list)) {
       if (index == index1) next
       if (index < index1) {
         comp_list$dList = c(isolate(comp_list$dList),
                             paste(condition_list[index], "vs", condition_list[index1], sep = " "))
         
         contrast$row = matrix(row, nrow = 1)
         contrast$row[index] = 1
         contrast$row[index1] = -1
         
         if (is.null(contrast$matrix)) {
           contrast$matrix = contrast$row
         } else {
           contrast$matrix = rbind(contrast$matrix, contrast$row)
           contrast$matrix = rbind(contrast$matrix[!duplicated(contrast$matrix),])
         }
-        
-        rownames(contrast$matrix) = comp_list$dList
-        colnames(contrast$matrix) = condition_list
       }
     }
   }
+  
+  rownames(contrast$matrix) = comp_list$dList
+  colnames(contrast$matrix) = condition_list
   
   return(contrast$matrix)
 }

210-270: Consider surfacing plot errors to the user.

The error handler at line 266-269 logs the error message but doesn't provide user feedback beyond removing the spinner. Users may not understand why the plot failed to render without visible feedback.

Consider returning an error plot or showing a notification:

   }, error = function(e) {
     remove_modal_spinner()
-    message("An error occurred: ", conditionMessage(e))
+    showNotification(
+      paste("Plot generation failed:", conditionMessage(e)),
+      type = "error",
+      duration = 10
+    )
+    return(NULL)
   })
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6072bb9 and 4c058c6.

📒 Files selected for processing (1)

R/module-statmodel-server.R (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (1)

R/module-statmodel-server.R (1)

422-423: LGTM: Clean refactoring with improved modularity.

The refactor successfully extracts complex logic into focused, testable functions. The public API (get_experimental_conditions, build_*_contrast, create_group_comparison_plot, etc.) provides clear separation of concerns.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

tests/testthat/test-utils-statmodel-server.R (2)
287-300: Consider verifying specific proteins in TMT test.

While the test correctly checks that all filtered proteins have adj.pvalue < 0.05, it could be strengthened by also verifying which specific proteins are included (P1 and P3), similar to the approach used in the standard data test (lines 302-315). This would catch potential bugs in the filtering logic.

Apply this diff to add protein verification:
   result <- extract_significant_proteins(data_comp, loadpage_input, 0.05)
   
   expect_equal(nrow(result), 2)
+  expect_equal(result$protein, c("P1", "P3"))
   expect_true(all(result$adj.pvalue < 0.05))
266-329: Consider adding boundary condition test.

The tests do not cover the boundary case where adj.pvalue equals exactly 0.05. This would clarify whether the filter uses < or <= for the threshold comparison.

Add a test case like:
test_that("extract_significant_proteins handles boundary value correctly", {
  data_comp <- list(
    ComparisonResult = data.frame(
      protein = c("P1", "P2", "P3"),
      adj.pvalue = c(0.049, 0.05, 0.051)
    )
  )
  loadpage_input <- list(BIO = "Protein", DDA_DIA = "DDA")
  
  result <- extract_significant_proteins(data_comp, loadpage_input, 0.05)
  
  # Verify whether 0.05 is included or excluded
  expect_equal(nrow(result), 1)  # or 2, depending on expected behavior
  expect_equal(result$protein[1], "P1")
})

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c058c6 and f4da8c6.

📒 Files selected for processing (1)

tests/testthat/test-utils-statmodel-server.R (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (6)

tests/testthat/test-utils-statmodel-server.R (6)

5-55: LGTM! Comprehensive coverage of experimental condition extraction.

The tests effectively cover all four data type combinations (PTM/non-PTM × TMT/non-TMT) and verify that the correct column is extracted based on the input parameters.

61-114: LGTM! Well-structured tests for pairwise contrast building.

The tests cover the core functionality including single contrasts, appending multiple comparisons, and the edge case where both groups are identical. The assertions verify both matrix structure and specific contrast values.

120-162: LGTM! Tests correctly validate the zero-sum constraint.

The tests verify that valid contrasts (weights summing to zero) are accepted and that invalid contrasts return the existing matrix unchanged, which is the expected behavior for validation failure.

168-204: LGTM! Tests properly verify all-against-one contrast generation.

The tests correctly validate that n-1 comparisons are generated for n conditions, with proper contrast values and row names.

210-260: LGTM! Tests comprehensively cover all-pair contrast generation.

The tests verify the combinatorial logic (n choose 2) for different group counts, check contrast values, and validate comparison names. The helpful comment on line 258 clarifies the expected behavior.

266-285: ****

The original review comment misidentified which data frame is used. When BIO == "PTM", the function returns data_comp$ADJUSTED.Model filtered by the threshold, not PTM.Model. The test data correctly provides ADJUSTED.Model with P1 (adj.pvalue = 0.04) and P2 (adj.pvalue = 0.3). With a threshold of 0.05, only P1 passes the filter, resulting in 1 row as expected. The test is correct.

Likely an incorrect or invalid review comment.

…119)

refactor(statsmodel): Refactor statsmodel server into sub-functions

e14d4a7