refactor(statsmodel): Refactor statsmodel server into sub-functions#119
refactor(statsmodel): Refactor statsmodel server into sub-functions#119tonywu1999 merged 13 commits intodevelfrom
Conversation
WalkthroughRefactors statmodel server by extracting contrast-matrix builders, plotting and download helpers, centralizing reactives/validation, and renames the UI contrast input from Changes
Sequence Diagram(s)sequenceDiagram
participant User as User / UI
participant Server as statmodelServer
participant Validator as validate_contrast_inputs
participant Builder as matrix_build → build_*
participant Plot as create_group_comparison_plot
participant Results as render_results_table / create_download_handlers
Note over User,Server: User sets conditions and chooses contrast_mode
User->>Server: set inputs (conditions, contrast_mode)
Server->>Validator: validate_contrast_inputs(input, contrast_mode, condition_list)
alt valid
Validator->>Builder: matrix_build delegates to appropriate build_* (custom, custom_np, all_one, all_pair)
Builder->>Plot: provide contrast matrix + selected row/data
Plot->>Results: compute results, render table, enable downloads
Results->>User: display table, code, plots, and download links
else invalid
Validator->>Server: return validation error
Server->>User: show validation messages / reset matrix state
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested labels
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Failed to generate code suggestions for PR |
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
R/module-statmodel-server.R(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
R/module-statmodel-server.R (2)
154-177: Reintroduce the rowname reset for all-pair contrasts.Hitting Submit twice in “All pair” mode still blows up with
length of 'dimnames' [1] not equal to array extentbecausecomp_list$dListis never cleared before rebuilding. Please wipe it (and rebuild from a local label vector) alongside the matrix so reruns stay aligned.build_all_pair_contrast = function(input, condition_list, contrast, comp_list, row, loadpage_input) { contrast$matrix = NULL - + comp_list$dList = NULL + labels = character(0) for (index in 1:length(condition_list)) { for (index1 in 1:length(condition_list)) { if (index == index1) next if (index < index1) { - comp_list$dList = c(isolate(comp_list$dList), - paste(condition_list[index], "vs", condition_list[index1], sep = " ")) + labels = c(labels, + paste(condition_list[index], "vs", condition_list[index1], sep = " ")) @@ - - rownames(contrast$matrix) = comp_list$dList - colnames(contrast$matrix) = condition_list } } } - + comp_list$dList = labels + rownames(contrast$matrix) = comp_list$dList + colnames(contrast$matrix) = condition_list return(contrast$matrix) }
268-281: Flatten PTM significant-protein lists before returning.
extract_significant_proteins()still hands PTM callers a list of three data frames, butrenderDataTable({ SignificantProteins() })and the CSV download both expect a single rectangular object. That reproduces the DT/render/download crashes we flagged earlier. Convert the PTM list into one data frame (e.g.,dplyr::bind_rows(sig_list, .id = "Model")) before returning so the downstream renderers keep working for PTM users.if (loadpage_input$BIO == "PTM") { - sig_unadj = data_comp$PTM.Model[data_comp$PTM.Model$adj.pvalue < signif_threshold,] - sig_prot = data_comp$PROTEIN.Model[data_comp$PROTEIN.Model$adj.pvalue < signif_threshold,] - sig_adj = data_comp$ADJUSTED.Model[data_comp$ADJUSTED.Model$adj.pvalue < signif_threshold,] - - list(PTM.Model = sig_unadj, - PROTEIN.Model = sig_prot, - ADJUSTED.Model = sig_adj) + sig_list = list( + PTM = data_comp$PTM.Model[data_comp$PTM.Model$adj.pvalue < signif_threshold,], + PROTEIN = data_comp$PROTEIN.Model[data_comp$PROTEIN.Model$adj.pvalue < signif_threshold,], + ADJUSTED = data_comp$ADJUSTED.Model[data_comp$ADJUSTED.Model$adj.pvalue < signif_threshold,] + ) + sig_list = Filter(nrow, sig_list) + if (!length(sig_list)) return(NULL) + dplyr::bind_rows(sig_list, .id = "Model") } else if (loadpage_input$DDA_DIA == "TMT") {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
R/module-statmodel-server.R(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
R/module-statmodel-server.R (3)
130-154: Resetcomp_list$dListto prevent label accumulation.Submitting "All against one" multiple times without clearing still accumulates labels in
comp_list$dList(line 136-137), causing a rownames length mismatch when assigning at line 150. The function must resetcomp_list$dListbefore the loop.Apply this diff:
build_all_against_one_contrast = function(input, condition_list, contrast, comp_list, row, loadpage_input) { + contrast$matrix = NULL + comp_list$dList = NULL + index3 = which(condition_list == input$group3) for (index in 1:length(condition_list)) { if (index == index3) next - comp_list$dList = c(isolate(comp_list$dList), + comp_list$dList = c(comp_list$dList, paste(condition_list[index], "vs", input$group3, sep = " "))
156-184: Resetcomp_list$dListalongsidecontrast$matrix.The function resets
contrast$matrix(line 157) but leavescomp_list$dListintact. Repeated "All pair" submissions append new labels to the existing vector (line 163-164), so the rownames assignment at line 177 fails with a length mismatch.Apply this diff:
build_all_pair_contrast = function(input, condition_list, contrast, comp_list, row, loadpage_input) { contrast$matrix = NULL + comp_list$dList = NULL for (index in 1:length(condition_list)) {
275-290: Handle PTM list-of-dataframes structure before rendering/downloading.
extract_significant_proteins()returns a list of three dataframes for PTM runs (lines 282-284) but a single dataframe otherwise. The download handler (line 373) and table renderer (lines 399-400) assume a single rectangular dataframe, so PTM runs will fail with coercion errors.Branch on
loadpage_input()$BIO == "PTM"oris.list(SignificantProteins())and either:
- Flatten the list into one dataframe (bind rows, add a source column), or
- Expose a selector to pick which list element to show/download
Update
output$significant,output$number, andoutput$download_signifto handle the chosen/flattened dataframe.Also applies to: 373-373, 399-400
🧹 Nitpick comments (3)
R/module-statmodel-server.R (3)
5-18: Simplify redundant condition checks.The nested conditions redundantly check
loadpage_input$BIO == "PTM"multiple times within branches that already verify this condition.Apply this diff to remove the redundant checks:
get_experimental_conditions = function(loadpage_input, preprocess_data) { - if (loadpage_input$BIO == "PTM" & - ((loadpage_input$BIO == "PTM" & loadpage_input$DDA_DIA == "TMT") | - loadpage_input$filetype == 'phil')) { + if (loadpage_input$BIO == "PTM" & + (loadpage_input$DDA_DIA == "TMT" | loadpage_input$filetype == 'phil')) { levels(preprocess_data$PTM$ProteinLevelData$Condition) - } else if (loadpage_input$BIO == "PTM" & - (loadpage_input$BIO == "PTM" & loadpage_input$DDA_DIA != "TMT")) { + } else if (loadpage_input$BIO == "PTM" & loadpage_input$DDA_DIA != "TMT") { levels(preprocess_data$PTM$ProteinLevelData$GROUP) } else if (loadpage_input$DDA_DIA == "TMT") { levels(preprocess_data$ProteinLevelData$Condition) } else { levels(preprocess_data$ProteinLevelData$GROUP) } }
92-93: Remove unnecessary rbind wrapper.Line 93 wraps the deduplicated matrix in an unnecessary outer
rbind()call.Apply this diff:
} else { contrast$matrix = rbind(contrast$matrix, contrast$row) - contrast$matrix = rbind(contrast$matrix[!duplicated(contrast$matrix),]) + contrast$matrix = contrast$matrix[!duplicated(contrast$matrix), , drop = FALSE] }
121-121: Remove unnecessary rbind wrapper.Same as the previous function, line 121 wraps the deduplicated matrix unnecessarily.
Apply this diff:
} else { contrast$matrix = rbind(contrast$matrix, contrast$row) - contrast$matrix = rbind(contrast$matrix[!duplicated(contrast$matrix),]) + contrast$matrix = contrast$matrix[!duplicated(contrast$matrix), , drop = FALSE] }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
R/module-statmodel-server.R(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
R/module-statmodel-server.R (1)
130-154: Reset contrast state before rebuilding all-against-one contrasts.This function still accumulates labels across multiple submissions. When a user clicks submit twice in "all_one" mode without clearing,
comp_list$dListretains the previous labels and appends new ones, causing a row name length mismatch with the contrast matrix.Apply this diff to reset state and build labels locally:
build_all_against_one_contrast = function(input, condition_list, contrast, comp_list, row, loadpage_input) { + contrast$matrix = NULL + comp_list$dList = NULL + labels = character(0) + index3 = which(condition_list == input$group3) for (index in 1:length(condition_list)) { if (index == index3) next - comp_list$dList = c(isolate(comp_list$dList), - paste(condition_list[index], "vs", input$group3, sep = " ")) + labels = c(labels, + paste(condition_list[index], "vs", input$group3, sep = " ")) contrast$row = matrix(row, nrow = 1) contrast$row[index] = 1 contrast$row[index3] = -1 if (is.null(contrast$matrix)) { contrast$matrix = contrast$row } else { contrast$matrix = rbind(contrast$matrix, contrast$row) } } + comp_list$dList = labels rownames(contrast$matrix) = comp_list$dList colnames(contrast$matrix) = condition_list return(contrast$matrix) }
🧹 Nitpick comments (3)
R/module-statmodel-server.R (3)
5-18: Simplify redundant condition checks.The function contains redundant checks for
loadpage_input$BIO == "PTM". When the outer condition already verifiesBIO == "PTM", the inner conditions don't need to repeat this check.Apply this diff to simplify the logic:
get_experimental_conditions = function(loadpage_input, preprocess_data) { - if (loadpage_input$BIO == "PTM" & - ((loadpage_input$BIO == "PTM" & loadpage_input$DDA_DIA == "TMT") | - loadpage_input$filetype == 'phil')) { + if (loadpage_input$BIO == "PTM" & + (loadpage_input$DDA_DIA == "TMT" | loadpage_input$filetype == 'phil')) { levels(preprocess_data$PTM$ProteinLevelData$Condition) - } else if (loadpage_input$BIO == "PTM" & - (loadpage_input$BIO == "PTM" & loadpage_input$DDA_DIA != "TMT")) { + } else if (loadpage_input$BIO == "PTM" & loadpage_input$DDA_DIA != "TMT") { levels(preprocess_data$PTM$ProteinLevelData$GROUP) } else if (loadpage_input$DDA_DIA == "TMT") { levels(preprocess_data$ProteinLevelData$Condition) } else { levels(preprocess_data$ProteinLevelData$GROUP) } }
156-185: Consider moving rownames assignment outside the nested loops.While the logic is correct, setting
rownames()andcolnames()inside the inner loop (line 178-179) is inefficient and gets repeated for every iteration. These assignments only need to happen once after the matrix is fully built.Apply this diff to improve efficiency:
build_all_pair_contrast = function(input, condition_list, contrast, comp_list, row, loadpage_input) { contrast$matrix = NULL comp_list$dList = NULL for (index in 1:length(condition_list)) { for (index1 in 1:length(condition_list)) { if (index == index1) next if (index < index1) { comp_list$dList = c(isolate(comp_list$dList), paste(condition_list[index], "vs", condition_list[index1], sep = " ")) contrast$row = matrix(row, nrow = 1) contrast$row[index] = 1 contrast$row[index1] = -1 if (is.null(contrast$matrix)) { contrast$matrix = contrast$row } else { contrast$matrix = rbind(contrast$matrix, contrast$row) contrast$matrix = rbind(contrast$matrix[!duplicated(contrast$matrix),]) } - - rownames(contrast$matrix) = comp_list$dList - colnames(contrast$matrix) = condition_list } } } + + rownames(contrast$matrix) = comp_list$dList + colnames(contrast$matrix) = condition_list return(contrast$matrix) }
210-270: Consider surfacing plot errors to the user.The error handler at line 266-269 logs the error message but doesn't provide user feedback beyond removing the spinner. Users may not understand why the plot failed to render without visible feedback.
Consider returning an error plot or showing a notification:
}, error = function(e) { remove_modal_spinner() - message("An error occurred: ", conditionMessage(e)) + showNotification( + paste("Plot generation failed:", conditionMessage(e)), + type = "error", + duration = 10 + ) + return(NULL) }) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
R/module-statmodel-server.R(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (1)
R/module-statmodel-server.R (1)
422-423: LGTM: Clean refactoring with improved modularity.The refactor successfully extracts complex logic into focused, testable functions. The public API (
get_experimental_conditions,build_*_contrast,create_group_comparison_plot, etc.) provides clear separation of concerns.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/testthat/test-utils-statmodel-server.R (2)
287-300: Consider verifying specific proteins in TMT test.While the test correctly checks that all filtered proteins have adj.pvalue < 0.05, it could be strengthened by also verifying which specific proteins are included (P1 and P3), similar to the approach used in the standard data test (lines 302-315). This would catch potential bugs in the filtering logic.
Apply this diff to add protein verification:
result <- extract_significant_proteins(data_comp, loadpage_input, 0.05) expect_equal(nrow(result), 2) + expect_equal(result$protein, c("P1", "P3")) expect_true(all(result$adj.pvalue < 0.05))
266-329: Consider adding boundary condition test.The tests do not cover the boundary case where adj.pvalue equals exactly 0.05. This would clarify whether the filter uses
<or<=for the threshold comparison.Add a test case like:
test_that("extract_significant_proteins handles boundary value correctly", { data_comp <- list( ComparisonResult = data.frame( protein = c("P1", "P2", "P3"), adj.pvalue = c(0.049, 0.05, 0.051) ) ) loadpage_input <- list(BIO = "Protein", DDA_DIA = "DDA") result <- extract_significant_proteins(data_comp, loadpage_input, 0.05) # Verify whether 0.05 is included or excluded expect_equal(nrow(result), 1) # or 2, depending on expected behavior expect_equal(result$protein[1], "P1") })
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/testthat/test-utils-statmodel-server.R(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (6)
tests/testthat/test-utils-statmodel-server.R (6)
5-55: LGTM! Comprehensive coverage of experimental condition extraction.The tests effectively cover all four data type combinations (PTM/non-PTM × TMT/non-TMT) and verify that the correct column is extracted based on the input parameters.
61-114: LGTM! Well-structured tests for pairwise contrast building.The tests cover the core functionality including single contrasts, appending multiple comparisons, and the edge case where both groups are identical. The assertions verify both matrix structure and specific contrast values.
120-162: LGTM! Tests correctly validate the zero-sum constraint.The tests verify that valid contrasts (weights summing to zero) are accepted and that invalid contrasts return the existing matrix unchanged, which is the expected behavior for validation failure.
168-204: LGTM! Tests properly verify all-against-one contrast generation.The tests correctly validate that n-1 comparisons are generated for n conditions, with proper contrast values and row names.
210-260: LGTM! Tests comprehensively cover all-pair contrast generation.The tests verify the combinatorial logic (n choose 2) for different group counts, check contrast values, and validate comparison names. The helpful comment on line 258 clarifies the expected behavior.
266-285: ****The original review comment misidentified which data frame is used. When
BIO == "PTM", the function returnsdata_comp$ADJUSTED.Modelfiltered by the threshold, notPTM.Model. The test data correctly providesADJUSTED.Modelwith P1 (adj.pvalue = 0.04) and P2 (adj.pvalue = 0.3). With a threshold of 0.05, only P1 passes the filter, resulting in 1 row as expected. The test is correct.Likely an incorrect or invalid review comment.
Summary by CodeRabbit
Refactor
New Features / UI
Quality
Tests