Feature ptm analysis#56
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughRemoves the exported getPathwaysFromIndra API and associated docs. Adds canonical GlobalProtein handling in annotation. Extends subnetwork utilities with PTM site parsing, node construction revisions, and input validations. Enhances HTML visualization with PTM-overlap tooltips and updated function signatures. Adds a new PTM vignette, updates the main vignette, and adjusts DESCRIPTION/NAMESPACE. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant R_Funcs as R Visualization
participant Consolidator as consolidateEdges
participant Creator as createEdgeElements
participant Config as generateCytoscapeConfig
participant JS as generateJavaScriptCode
participant Browser as CytoscapeJS
User->>R_Funcs: previewNetworkInBrowser(nodes, edges)
R_Funcs->>Creator: createEdgeElements(edges, nodes)
Creator->>Consolidator: consolidateEdges(edges, nodes)
Consolidator->>Consolidator: Group by source-target-interaction<br/>Compute PTM overlap text
Consolidator-->>Creator: edges + ptm_overlap
Creator-->>Config: edge elements with tooltip field
Config-->>JS: Cytoscape config (nodes, edges, tooltips)
JS->>Browser: Initialize graph and tooltip handlers
User->>Browser: Hover edge
Browser-->>User: Show PTM overlap tooltip (if non-empty)
sequenceDiagram
autonumber
participant Input as Annotated Input
participant Utils as getSubnetworkFromIndra (utils)
participant API as INDRA API
participant Parser as JSON Parser
participant Builder as Node/Edge Builder
Input->>Utils: annotated_df (Protein/GlobalProtein, HgncId, log2FC, adj.pvalue)
Utils->>Utils: Validate protein count (unique HGNC)
Utils->>API: Query with unique HGNC + force_include_other (ns:id)
API-->>Utils: Statements (stmt_json, refs)
Utils->>Parser: Extract residue/position from stmt_json
Parser-->>Utils: site (e.g., Y1110)
Utils->>Builder: Map edges (add site), map nodes from input, restrict to in-edges
Builder-->>Input: nodes, edges (with site)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (5)
✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #56 +/- ##
==========================================
+ Coverage 57.36% 64.30% +6.94%
==========================================
Files 8 7 -1
Lines 1222 1216 -6
==========================================
+ Hits 701 782 +81
+ Misses 521 434 -87 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
R/annotateProteinInfoFromIndra.R (1)
70-79: Update Protein column reference to use protein_ids consistently.In the Uniprot_Mnemonic path, the code still references
df$Proteinwhen setting UniprotId values, but should use the index based on the original protein_ids source.Fix the inconsistent column reference:
if (proteinIdType == "Uniprot_Mnemonic") { mnemonicProteins <- protein_ids if (length(mnemonicProteins) > 0) { uniprotMapping <- .callGetUniprotIdsFromUniprotMnemonicIdsApi(as.list(mnemonicProteins)) for (mnemonicId in names(uniprotMapping)) { if (!is.null(uniprotMapping[[mnemonicId]])) { - df$UniprotId[df$Protein == mnemonicId] <- uniprotMapping[[mnemonicId]] + df$UniprotId[protein_ids == mnemonicId] <- uniprotMapping[[mnemonicId]] } } } }R/getPathwaysFromIndra.R (7)
29-33: Harden normal fit: handle NA/Inf and small-N failures.
fitdistrwill error iflog2FChas NA/Inf or too few finite values. Add filtering and a safe fallback.Apply:
- log2fc_values <- annotated_df$log2FC - fit <- fitdistr(log2fc_values, "normal") - para <- fit$estimate + log2fc_values <- annotated_df$log2FC + log2fc_values <- log2fc_values[is.finite(log2fc_values)] + if (length(log2fc_values) >= 2) { + fit <- fitdistr(log2fc_values, "normal") + para <- fit$estimate + } else { + # Fallback to standard normal when insufficient data + para <- c(mean = 0, sd = 1) + }
28-28: Ensure consistent types for HGNC IDs.Later comparisons rely on character equality. Coerce once to avoid subtle mismatches.
Apply:
annotated_df$Protein <- as.character(annotated_df$Protein) + annotated_df$HgncId <- as.character(annotated_df$HgncId)
53-57: Build URL via query params and handle HTTP errors.Avoid manual string paste and add basic HTTP error handling.
Apply:
- url = paste('https://db.indra.bio/statements/from_agents?subject=', - source_id, namespace, sep = "") - response <- GET(url) - z = content(response) + response <- GET( + "https://db.indra.bio/statements/from_agents", + query = list(subject = paste0(source_id, namespace)) + ) + if (response$status_code >= 300) { + stop(sprintf("INDRA request failed (HTTP %s)", response$status_code)) + } + z <- content(response)
66-77: Subject matching in Complex edges: avoididentical()type pitfalls.Coerce both sides to character;
identical()will fail on numeric vs character.Apply:
- if (identical(edge$members[[1]]$db_refs[[id_field]], source_id)) { + if (as.character(edge$members[[1]]$db_refs[[id_field]]) == source_id) { obj = edge$members[[2]]$db_refs$HGNC namespaces = names(edge$members[[2]]$db_refs) } else { obj = edge$members[[1]]$db_refs$HGNC namespaces = names(edge$members[[1]]$db_refs) }
86-91: HGNC filter: be explicit about types and missing values.Make membership checks robust to NA/NULL and type coercion.
Apply:
- if (!("HGNC" %in% namespaces)) { + if (is.null(namespaces) || !("HGNC" %in% namespaces)) { next - } else if (!(obj %in% annotated_df$HgncId)) { + } else if (is.null(obj) || is.na(obj) || !(as.character(obj) %in% annotated_df$HgncId)) { next }
121-132: Probability calc: NA-safe logFC, avoid log10(0), clamp to [0,1].Prevents runtime errors and keeps probabilities in range.
Apply:
- prob_logFC = 0 - logFC = annotated_df[which(annotated_df$HgncId == edgeToMetadataMapping[[key]]$target_id),] - logFC = logFC$log2FC[which.max(abs(logFC$log2FC))] + prob_logFC <- 0 + logFC_rows <- annotated_df[annotated_df$HgncId == edgeToMetadataMapping[[key]]$target_id, , drop = FALSE] + vals <- logFC_rows$log2FC + vals <- vals[is.finite(vals)] + logFC <- if (length(vals)) vals[which.max(abs(vals))] else 0 if (logFC > para[1]) { prob_logFC = 1 - pnorm(logFC, mean = para[1], sd = para[2]) } else { prob_logFC = pnorm(logFC, mean = para[1], sd = para[2]) } - evidence_prob = 10^(m*log10(edgeToMetadataMapping[[key]]$data$evidence_count)+b) - edgeToMetadataMapping[[key]]$data$total_prob = 1 - ((1 - prob_logFC) * (1 - evidence_prob)) + cnt <- max(1, as.numeric(edgeToMetadataMapping[[key]]$data$evidence_count)) + evidence_prob <- 10^(m * log10(cnt) + b) + evidence_prob <- max(0, min(1, evidence_prob)) + total_prob <- 1 - ((1 - prob_logFC) * (1 - evidence_prob)) + edgeToMetadataMapping[[key]]$data$total_prob <- max(0, min(1, total_prob)) - edgeToMetadataMapping[[key]]$data$logFC = logFC + edgeToMetadataMapping[[key]]$data$logFC <- logFC
160-175: Align added node schema with.constructNodesDataFrame
- Replace
pvalue = 0withadj.pvalue = 0when appending the main target node:- logFC = 0, - pvalue = 0, + logFC = 0, + adj.pvalue = 0,
- Optionally, verify that
main_targetuses the same identifier namespace (UniProt) asnodes$id, or map it before binding to avoid orphaned nodes.
🧹 Nitpick comments (7)
R/utils_getSubnetworkFromIndra.R (1)
289-314: Clean up commented-out code.There's a large block of commented-out code that should be removed if it's no longer needed. This impacts code readability and maintenance.
Remove the commented-out code block from lines 294-313 since the new implementation is working correctly.
R/visualizeNetworksWithHTML.R (1)
566-567: Simplify the tooltip display condition.The condition checking for "No overlapping PTM sites found" is unnecessary since the new implementation returns an empty string for no overlaps.
Simplify the condition:
- if (tooltipText && tooltipText.trim() !== '' && tooltipText.trim() !== 'No overlapping PTM sites found') { + if (tooltipText && tooltipText.trim() !== '') {R/annotateProteinInfoFromIndra.R (1)
58-58: Address the TODO comment about PTM dataset handling.The TODO comment indicates that PTM dataset handling needs improvement. This should be addressed or tracked in an issue.
Would you like me to help create a more robust PTM dataset handling implementation or open an issue to track this technical debt?
R/getPathwaysFromIndra.R (1)
135-158: Nit: use explicit FUN.VALUE types in vapply.Improve type safety by using
character(1)/numeric(1).Apply:
- }, ""), + }, character(1)), ... - }, ""), + }, character(1)), ... - }, 1), + }, numeric(1)), ... - }, 1), + }, numeric(1)), ... - }, 1), + }, numeric(1)), ... - }, ""), + }, character(1)),vignettes/PTM-Analysis.Rmd (3)
31-33: Verify install branch name.The PR branch is “feature-ptm-analysis”, but the install command uses “feature-ptm”. Update if that’s unintentional.
Apply:
-devtools::install_github("Vitek-Lab/MSstatsBioNet@feature-ptm", build_vignettes = TRUE) +devtools::install_github("Vitek-Lab/MSstatsBioNet@feature-ptm-analysis", build_vignettes = TRUE)
72-78: Set expectations conservatively.“GAB1 is rated as the most relevant edge” may not always hold with user data/filters. Phrase as an example or remove.
Apply:
-Here, we will get pathways from INDRA and rank them based on relevance. We will see here that GAB1 is rated as the most relevant edge to track. +Here, we will get pathways from INDRA and rank them based on relevance. Results will vary by dataset and filters.
24-33: Keep install chunks non-evaluating.Good use of
eval = FALSE. Consider also prefacing with a note that installs should be run in a clean R session, not during vignette build.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
inst/extdata/2_egf_data.csvis excluded by!**/*.csv
📒 Files selected for processing (5)
R/annotateProteinInfoFromIndra.R(1 hunks)R/getPathwaysFromIndra.R(3 hunks)R/utils_getSubnetworkFromIndra.R(9 hunks)R/visualizeNetworksWithHTML.R(19 hunks)vignettes/PTM-Analysis.Rmd(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pkgdown
- GitHub Check: Run pr agent on every pull request
- GitHub Check: test-coverage
- GitHub Check: build
🔇 Additional comments (2)
R/utils_getSubnetworkFromIndra.R (1)
148-153: Fix the regex pattern for PTM site extraction.The current regex pattern
".*?_(?=[A-Z][0-9])"will fail to correctly extract PTM sites in several cases:
- It doesn't properly handle multi-digit positions (e.g.,
_S123)- The replacement pattern removes everything up to the underscore, but should only remove the protein name prefix
Apply this diff to fix the PTM site extraction:
- input$Site = ifelse(grepl("_[A-Z][0-9]", input$Protein), - gsub(".*?_(?=[A-Z][0-9])", "", input$Protein, perl = TRUE), + input$Site = ifelse(grepl("_[A-Z][0-9]+", input$Protein), + gsub("^[^_]+_", "", input$Protein), NA_character_)Likely an incorrect or invalid review comment.
R/visualizeNetworksWithHTML.R (1)
960-963: LGTM! Clear PTM visualization documentation.The addition of PTM site overlap information in the legend and instructions is well-implemented and provides clear guidance to users about the hover functionality.
| calculatePTMOverlapAggregated <- function(edges, nodes) { | ||
| if (nrow(edges) == 0) return(character(0)) | ||
|
|
||
| # Group edges by source-target-interaction to match consolidation logic | ||
| edges$edge_key <- paste(edges$source, edges$target, edges$interaction, sep = "-") | ||
| unique_edges <- unique(edges$edge_key) | ||
|
|
||
| overlap_info <- character(length(unique_edges)) | ||
| names(overlap_info) <- unique_edges | ||
|
|
||
| for (edge_key in unique_edges) { | ||
| # Get all edges with this source-target-interaction combination | ||
| matching_edges <- edges[edges$edge_key == edge_key, ] | ||
| all_overlap_sites <- c() | ||
|
|
||
| # Process each matching edge to find PTM overlaps | ||
| for (i in 1:nrow(matching_edges)) { | ||
| edge <- matching_edges[i, ] | ||
|
|
||
| # Check if edge has target and site information | ||
| if (!is.na(edge$target) && "site" %in% names(edge) && !is.na(edge$site)) { | ||
| # Find matching nodes with the same target ID | ||
| target_nodes <- nodes[nodes$id == edge$target, ] | ||
|
|
||
| if (nrow(target_nodes) > 0 && "Site" %in% names(target_nodes)) { | ||
| edge_sites <- trimws(unlist(strsplit(as.character(edge$site), "[,;|]"))) | ||
|
|
||
| # Check each target node row for site matches | ||
| for (j in 1:nrow(target_nodes)) { | ||
| if (!is.na(target_nodes$Site[j])) { | ||
| node_sites <- trimws(unlist(strsplit(as.character(target_nodes$Site[j]), "[,;|]"))) | ||
|
|
||
| # Find overlapping sites for this edge-node combination | ||
| overlap_sites <- intersect(edge_sites, node_sites) | ||
| overlap_sites <- overlap_sites[overlap_sites != "" & !is.na(overlap_sites)] | ||
|
|
||
| # Add to the aggregate list | ||
| all_overlap_sites <- c(all_overlap_sites, overlap_sites) | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| # Remove duplicates and create tooltip text for this consolidated edge | ||
| unique_overlap_sites <- unique(all_overlap_sites) | ||
| unique_overlap_sites <- unique_overlap_sites[unique_overlap_sites != "" & !is.na(unique_overlap_sites)] | ||
|
|
||
| # CHANGED: Only create tooltip text if there are actual overlapping sites | ||
| if (length(unique_overlap_sites) > 0) { | ||
| if (length(unique_overlap_sites) == 1) { | ||
| overlap_info[edge_key] <- paste0("Overlapping PTM site: ", unique_overlap_sites[1]) | ||
| } else { | ||
| overlap_info[edge_key] <- paste0("Overlapping PTM sites: ", paste(unique_overlap_sites, collapse = ", ")) | ||
| } | ||
| } else { | ||
| # CHANGED: Return empty string instead of "No overlapping PTM sites found" | ||
| overlap_info[edge_key] <- "" | ||
| } | ||
| } | ||
|
|
||
| return(overlap_info) | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Improve PTM overlap calculation performance and robustness.
The PTM overlap calculation has several issues:
- The triple nested loop (edges × target_nodes × sites) could be inefficient for large datasets
- Site splitting using multiple delimiters
[,;|]should be documented - No validation that the Site column exists in nodes before accessing it
Consider these improvements:
calculatePTMOverlapAggregated <- function(edges, nodes) {
if (nrow(edges) == 0) return(character(0))
+
+ # Validate required columns
+ if (!"Site" %in% names(nodes)) {
+ return(character(nrow(edges))) # Return empty overlaps if no Site info
+ }
# Group edges by source-target-interaction to match consolidation logic
edges$edge_key <- paste(edges$source, edges$target, edges$interaction, sep = "-")Also consider pre-processing the node sites into a lookup table to improve performance.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
R/visualizeNetworksWithHTML.R around lines 82-144: the PTM overlap routine is
slow and fragile due to triple nested loops, lack of Site column validation, and
undocumented delimiter use; fix by (1) early-validate that nodes has a "Site"
column and gracefully handle missing/NA values; (2) pre-process nodes into a
lookup list mapping node id -> unique vector of trimmed sites (split once using
documented delimiter pattern "[,;|]"), removing empty/NA entries; (3) compute
overlaps per consolidated edge by splitting the edge site string once and
intersecting with the precomputed node-site vectors (use vectorized set
operations rather than looping per target node row), aggregating unique overlaps
and producing the same tooltip string logic (empty string when none); and (4)
add a short comment noting the delimiter semantics and why the lookup improves
performance.
|
|
||
| # Create edge data with styling information and PTM overlap tooltip |
There was a problem hiding this comment.
Escape quotes more robustly for JavaScript safety.
The current quote escaping only handles single quotes but should also handle double quotes and other special characters that could break JavaScript.
Use a more robust escaping approach:
- # Escape quotes in tooltip text for JavaScript safety
- tooltip_text <- gsub("'", "\\\\'", row$ptm_overlap)
+ # Escape special characters for JavaScript safety
+ tooltip_text <- gsub("(['\"\\\\])", "\\\\\\1", row$ptm_overlap)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| tooltip_text <- gsub("'", "\\\\'", row$ptm_overlap) | |
| # Escape special characters for JavaScript safety | |
| tooltip_text <- gsub("(['\"\\\\])", "\\\\\\1", row$ptm_overlap) |
| tooltip.style.cssText = ` | ||
| position: absolute; | ||
| background-color: rgba(0, 0, 0, 0.9); | ||
| color: white; | ||
| padding: 8px 12px; | ||
| border-radius: 4px; | ||
| font-size: 12px; | ||
| font-family: Arial, sans-serif; | ||
| white-space: nowrap; | ||
| pointer-events: none; | ||
| z-index: 9999; | ||
| box-shadow: 0 2px 8px rgba(0, 0, 0, 0.3); | ||
| display: none; | ||
| max-width: 300px; | ||
| word-wrap: break-word; | ||
| white-space: pre-wrap; | ||
| `; | ||
| document.body.appendChild(tooltip); | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add null checks for tooltip element creation.
The tooltip creation code assumes document.body exists when the script runs, which might not always be the case if the script executes before DOM is ready.
Wrap tooltip creation in a DOM ready check:
+ // Ensure DOM is ready before creating tooltip
+ if (document.body) {
var tooltip = document.createElement('div');
tooltip.style.cssText = \`...\`;
document.body.appendChild(tooltip);
+ } else {
+ console.error('Document body not ready for tooltip creation');
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| var tooltip = document.createElement('div'); | |
| tooltip.style.cssText = ` | |
| position: absolute; | |
| background-color: rgba(0, 0, 0, 0.9); | |
| color: white; | |
| padding: 8px 12px; | |
| border-radius: 4px; | |
| font-size: 12px; | |
| font-family: Arial, sans-serif; | |
| white-space: nowrap; | |
| pointer-events: none; | |
| z-index: 9999; | |
| box-shadow: 0 2px 8px rgba(0, 0, 0, 0.3); | |
| display: none; | |
| max-width: 300px; | |
| word-wrap: break-word; | |
| white-space: pre-wrap; | |
| `; | |
| document.body.appendChild(tooltip); | |
| // Ensure DOM is ready before creating tooltip | |
| if (document.body) { | |
| var tooltip = document.createElement('div'); | |
| tooltip.style.cssText = ` | |
| position: absolute; | |
| background-color: rgba(0, 0, 0, 0.9); | |
| color: white; | |
| padding: 8px 12px; | |
| border-radius: 4px; | |
| font-size: 12px; | |
| font-family: Arial, sans-serif; | |
| white-space: nowrap; | |
| pointer-events: none; | |
| z-index: 9999; | |
| box-shadow: 0 2px 8px rgba(0, 0, 0, 0.3); | |
| display: none; | |
| max-width: 300px; | |
| word-wrap: break-word; | |
| white-space: pre-wrap; | |
| `; | |
| document.body.appendChild(tooltip); | |
| } else { | |
| console.error('Document body not ready for tooltip creation'); | |
| } |
🤖 Prompt for AI Agents
In R/visualizeNetworksWithHTML.R around lines 542 to 560, the code creates and
appends a tooltip element assuming document.body exists; change this to wait for
DOM readiness by checking document.readyState and if not 'loading' create and
append the tooltip immediately, otherwise attach a DOMContentLoaded listener to
create/append the tooltip when the DOM is ready; ensure the tooltip variable is
created in the same scope so event handlers can access it and guard against
duplicate creation if the listener is invoked multiple times.
shiny needs fixing to refer to global protein
… info in subnetwork function
93d7401 to
c206517
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
R/utils_getSubnetworkFromIndra.R (2)
59-66: Add HTTP timeouts and error handling for INDRA API calls.Network calls lack timeouts and status checks; failures can hang or silently proceed with bad responses.
- res <- POST( + res <- POST( indraCogexUrl, body = groundings, add_headers("Content-Type" = "application/json"), - encode = "raw" + encode = "raw", + httr::timeout(60) ) - res <- content(res) + if (httr::http_error(res)) { + stop(sprintf("INDRA API request failed (HTTP %s): %s", + httr::status_code(res), httr::content(res, as = "text"))) + } + res <- content(res)
110-123: Normalize Protein and derive Site before exemptions/filters; avoid factor coercion.As written, exempt_proteins are captured before Site/GlobalProtein normalization, so forced proteins may lose Site and dedup may keep stale rows. Also, assigning GlobalProtein can coerce Protein back to factor.
.filterGetSubnetworkFromIndraInput <- function(input, pvalueCutoff, logfc_cutoff, force_include_proteins) { - # Extract exempt proteins before any filtering + # Normalize Protein and derive Site before exemptions/filters + prot_raw = as.character(input$Protein) + input$Site = ifelse(grepl("_[A-Za-z][0-9]", prot_raw), + sub(".*?_(?=[A-Za-z][0-9])", "", prot_raw, perl = TRUE), + NA_character_) + if ("GlobalProtein" %in% colnames(input)) { + input$Protein = as.character(input$GlobalProtein) + } else { + input$Protein = prot_raw + } + + # Extract exempt proteins after normalization exempt_proteins <- NULL if (!is.null(force_include_proteins)) { if (!is.character(force_include_proteins)) { stop("force_include_proteins must be a character vector") } missing_prots <- setdiff(force_include_proteins, input$Protein) if (length(missing_prots) > 0) { warning("force_include_proteins not found: ", paste(missing_prots, collapse = ", ")) } exempt_proteins <- input[input$Protein %in% force_include_proteins,] } - - # Apply standard filtering + # Apply standard filtering input <- input[!is.na(input$adj.pvalue),] @@ - input$Protein <- as.character(input$Protein) - - # Handle PTMs in Protein column - input$Site = ifelse(grepl("_[A-Z][0-9]", input$Protein), - gsub(".*?_(?=[A-Z][0-9])", "", input$Protein, perl = TRUE), - NA_character_) - if ("GlobalProtein" %in% colnames(input)) { - input$Protein = input$GlobalProtein - }Also applies to: 145-153
♻️ Duplicate comments (3)
R/utils_getSubnetworkFromIndra.R (3)
45-46: Good: deduplicate HGNC IDs before building groundings.This addresses earlier feedback about deduplication ordering.
172-177: Guard against missing Protein column in metadata mapping.Accessing input$Protein without validation can error; add a check.
.addAdditionalMetadataToIndraEdge <- function(edge, input) { + if (!"Protein" %in% colnames(input)) { + stop("Input must contain 'Protein' column for edge metadata") + } @@ - uniprot_ids_source <- unique(matched_rows_source$Protein) + uniprot_ids_source <- unique(matched_rows_source$Protein) @@ - uniprot_ids_target = unique(matched_rows_target$Protein) + uniprot_ids_target = unique(matched_rows_target$Protein)Also applies to: 180-185
13-14: Fix 400-protein check to only count unique HGNC IDs (including HGNC in force_include_other).Currently all force_include_other entries are counted, even non-HGNC namespaces and duplicates, which can wrongly reject valid queries.
- num_proteins = length(unique(input$HgncId)) + - ifelse(!is.null(force_include_other), length(force_include_other), 0) + unique_hgnc_ids = unique(input$HgncId) + if (!is.null(force_include_other)) { + hgnc_extra = vapply(strsplit(force_include_other, ":"), function(parts) { + if (length(parts) == 2 && toupper(parts[1]) == "HGNC") parts[2] else NA_character_ + }, NA_character_) + unique_hgnc_ids = union(unique_hgnc_ids, unique(stats::na.omit(hgnc_extra))) + } + num_proteins = length(unique_hgnc_ids)
🧹 Nitpick comments (1)
R/utils_getSubnetworkFromIndra.R (1)
260-262: Preserve NA for site column.Using "" as FUN.VALUE can coerce NA to empty string in some flows; prefer NA_character_.
- site = vapply(keys(res), function(x) { - query(res, x)$site - }, ""), + site = vapply(keys(res), function(x) { + query(res, x)$site + }, NA_character_),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
inst/extdata/2_egf_data.csvis excluded by!**/*.csv
📒 Files selected for processing (5)
R/annotateProteinInfoFromIndra.R(1 hunks)R/getPathwaysFromIndra.R(3 hunks)R/utils_getSubnetworkFromIndra.R(9 hunks)R/visualizeNetworksWithHTML.R(19 hunks)vignettes/PTM-Analysis.Rmd(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- R/visualizeNetworksWithHTML.R
- vignettes/PTM-Analysis.Rmd
- R/annotateProteinInfoFromIndra.R
- R/getPathwaysFromIndra.R
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build
- GitHub Check: test-coverage
- GitHub Check: pkgdown
🔇 Additional comments (2)
R/utils_getSubnetworkFromIndra.R (2)
194-194: Import alignment looks good.fromJSON is used below; roxygen import is appropriate.
232-233: Import alignment for fromJSON is consistent.Matches usage below in edges construction.
| json_object <- fromJSON(edge$data$stmt_json) | ||
| if (!is.null(json_object$residue) && !is.null(json_object$position)) { | ||
| edge$site = paste0(json_object$residue, json_object$position) | ||
| key <- paste(key, edge$site, sep = "_") | ||
| } else { | ||
| edge$site = NA_character_ | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Harden stmt_json parsing.
fromJSON can throw on malformed JSON; use tryCatch to avoid dropping the whole run.
- json_object <- fromJSON(edge$data$stmt_json)
- if (!is.null(json_object$residue) && !is.null(json_object$position)) {
+ json_object <- tryCatch(fromJSON(edge$data$stmt_json), error = function(e) NULL)
+ if (!is.null(json_object) && !is.null(json_object$residue) && !is.null(json_object$position)) {
edge$site = paste0(json_object$residue, json_object$position)
key <- paste(key, edge$site, sep = "_")
} else {
edge$site = NA_character_
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| json_object <- fromJSON(edge$data$stmt_json) | |
| if (!is.null(json_object$residue) && !is.null(json_object$position)) { | |
| edge$site = paste0(json_object$residue, json_object$position) | |
| key <- paste(key, edge$site, sep = "_") | |
| } else { | |
| edge$site = NA_character_ | |
| } | |
| json_object <- tryCatch(fromJSON(edge$data$stmt_json), error = function(e) NULL) | |
| if (!is.null(json_object) && !is.null(json_object$residue) && !is.null(json_object$position)) { | |
| edge$site = paste0(json_object$residue, json_object$position) | |
| key <- paste(key, edge$site, sep = "_") | |
| } else { | |
| edge$site = NA_character_ | |
| } |
🤖 Prompt for AI Agents
In R/utils_getSubnetworkFromIndra.R around lines 204-210,
fromJSON(edge$data$stmt_json) can throw on malformed JSON and crash the run;
wrap the fromJSON call in tryCatch to safely handle parsing errors (returning
NULL or an empty list on error), and when an error occurs set edge$site <-
NA_character_ (and optionally log a warning including the edge id or key).
Ensure the subsequent logic that sets key and pastes site only runs when
json_object was successfully parsed and residue/position are present.
| nodes = input[, c("Protein", "log2FC", "adj.pvalue", "HgncName", "Site")] | ||
| colnames(nodes) = c("id", "logFC", "adj.pvalue", "hgncName", "Site") | ||
|
|
||
| # Add attributes from input where available | ||
| nodes$logFC <- input$log2FC[match(nodes$id, input$Protein)] | ||
| nodes$adj.pvalue <- input$adj.pvalue[match(nodes$id, input$Protein)] | ||
| nodes$hgncName <- if ("HgncName" %in% colnames(input) && is.character(input$HgncName)) { | ||
| hgnc_value <- input$HgncName[match(nodes$id, input$Protein)] | ||
| ifelse(is.na(hgnc_value), nodes$id, hgnc_value) | ||
| } else { | ||
| nodes$id | ||
| } | ||
| nodes = nodes[nodes$id %in% c(edges$source, edges$target), ] | ||
| nodes$hgncName = ifelse(is.na(nodes$hgncName), nodes$id, nodes$hgncName) | ||
| # |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Make nodes construction resilient to missing columns.
Selecting HgncName/Site without existence checks can error; precreate when absent.
.constructNodesDataFrame <- function(input, edges) {
+ if (!"HgncName" %in% colnames(input)) input$HgncName <- NA_character_
+ if (!"Site" %in% colnames(input)) input$Site <- NA_character_
nodes = input[, c("Protein", "log2FC", "adj.pvalue", "HgncName", "Site")]
colnames(nodes) = c("id", "logFC", "adj.pvalue", "hgncName", "Site")
nodes = nodes[nodes$id %in% c(edges$source, edges$target), ]
nodes$hgncName = ifelse(is.na(nodes$hgncName), nodes$id, nodes$hgncName)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| nodes = input[, c("Protein", "log2FC", "adj.pvalue", "HgncName", "Site")] | |
| colnames(nodes) = c("id", "logFC", "adj.pvalue", "hgncName", "Site") | |
| # Add attributes from input where available | |
| nodes$logFC <- input$log2FC[match(nodes$id, input$Protein)] | |
| nodes$adj.pvalue <- input$adj.pvalue[match(nodes$id, input$Protein)] | |
| nodes$hgncName <- if ("HgncName" %in% colnames(input) && is.character(input$HgncName)) { | |
| hgnc_value <- input$HgncName[match(nodes$id, input$Protein)] | |
| ifelse(is.na(hgnc_value), nodes$id, hgnc_value) | |
| } else { | |
| nodes$id | |
| } | |
| nodes = nodes[nodes$id %in% c(edges$source, edges$target), ] | |
| nodes$hgncName = ifelse(is.na(nodes$hgncName), nodes$id, nodes$hgncName) | |
| # | |
| if (!"HgncName" %in% colnames(input)) input$HgncName <- NA_character_ | |
| if (!"Site" %in% colnames(input)) input$Site <- NA_character_ | |
| nodes = input[, c("Protein", "log2FC", "adj.pvalue", "HgncName", "Site")] | |
| colnames(nodes) = c("id", "logFC", "adj.pvalue", "hgncName", "Site") | |
| nodes = nodes[nodes$id %in% c(edges$source, edges$target), ] | |
| nodes$hgncName = ifelse(is.na(nodes$hgncName), nodes$id, nodes$hgncName) | |
| # |
🤖 Prompt for AI Agents
In R/utils_getSubnetworkFromIndra.R around lines 289 to 294, the code assumes
columns "HgncName" and "Site" exist when subsetting input which can cause
errors; before subsetting, check if those columns are present and if not add
them to input (e.g. create input$HgncName <- NA and input$Site <- NA), then
subset using only the required columns (or use intersect to guard against
missing ones), rename to id/logFC/adj.pvalue/hgncName/Site, filter nodes by
edges, and keep the existing fallback that sets hgncName to id when NA.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
R/annotateProteinInfoFromIndra.R (1)
58-65: Harden GlobalProtein selection and avoid over-stripping with a safer PTM regex.
- Validate existing GlobalProtein (NA/empty) and fall back to Protein when unusable. This echoes the prior review—adding the guard prevents silent propagation of bad IDs.
- The current pattern can wrongly truncate tokens like GENE_A0A024RBG1. Anchor to PTM-like tokens at the end (e.g., _S123 or _S123;T45). Also drop NAs before building protein_ids.
Apply:
- if ("GlobalProtein" %in% colnames(df)) { - protein_ids = unique(df$GlobalProtein) - } else { - df$GlobalProtein = ifelse(grepl("_[A-Z][0-9]", df$Protein), - gsub("_[A-Z][0-9].*", "", df$Protein, perl = TRUE), - df$Protein) - protein_ids = unique(df$GlobalProtein) - } + if ("GlobalProtein" %in% colnames(df)) { + bad <- is.na(df$GlobalProtein) | df$GlobalProtein == "" + if (all(bad)) { + warning("GlobalProtein column is empty; deriving from Protein.") + df$GlobalProtein <- df$Protein + } + } else { + # Strip trailing PTM site tokens (e.g., _S123 or _S123;T45), keep mnemonic suffixes like _HUMAN + df$GlobalProtein <- gsub("_(?:[A-Z]{1,3})[0-9]+(?:;[A-Z]{1,3}[0-9]+)*$", "", df$Protein, perl = TRUE) + } + protein_ids <- unique(na.omit(df$GlobalProtein))
🧹 Nitpick comments (2)
R/annotateProteinInfoFromIndra.R (2)
72-73: Skip NA/empty mnemonic IDs before calling the API.Avoids unnecessary calls and edge-case warnings.
- mnemonicProteins <- protein_ids + mnemonicProteins <- protein_ids[!is.na(protein_ids) & nzchar(protein_ids)]
75-79: Vectorize the mnemonic→UniProt assignment.Removes the R-level loop and simplifies logic.
- for (mnemonicId in names(uniprotMapping)) { - if (!is.null(uniprotMapping[[mnemonicId]])) { - df$UniprotId[df$GlobalProtein == mnemonicId] <- uniprotMapping[[mnemonicId]] - } - } + map <- unlist(uniprotMapping, use.names = TRUE) + hit <- match(df$GlobalProtein, names(map)) + df$UniprotId[!is.na(hit)] <- map[hit[!is.na(hit)]]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
R/annotateProteinInfoFromIndra.R(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pkgdown
- GitHub Check: build
- GitHub Check: test-coverage
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
DESCRIPTION (2)
39-40: Prefer HTTPS for the project homepage URL.If supported, switch to HTTPS.
-URL: http://msstats.org, https://vitek-lab.github.io/MSstatsBioNet/ +URL: https://msstats.org, https://vitek-lab.github.io/MSstatsBioNet/
21-26: Consider minimum-version pins for key Imports.If you rely on recent APIs (e.g., tidyr 1.3+ helpers, RCy3 newer functions), add minimal versions in Imports to avoid CI/user mismatches.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
DESCRIPTION(1 hunks)NAMESPACE(0 hunks)R/getPathwaysFromIndra.R(0 hunks)man/getPathwaysFromIndra.Rd(0 hunks)
💤 Files with no reviewable changes (3)
- man/getPathwaysFromIndra.Rd
- NAMESPACE
- R/getPathwaysFromIndra.R
🔇 Additional comments (2)
DESCRIPTION (2)
22-26: Good cleanup removing MASS from Imports.Reduces dependency footprint; tidyr trailing-comma fix looks correct.
22-26: Verified — no remaining MASS usage.rg scan of R/ and vignettes found no fitdistr/MASS::/library(MASS)/require(MASS); NAMESPACE contains no MASS; DESCRIPTION lists Imports/Suggests but not MASS.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
vignettes/PTM-Analysis.Rmd (3)
37-41: Fail fast if the example data file is missing.Use system.file(..., mustWork = TRUE) and avoid nested calls for clarity.
-input = data.table::fread(system.file( - "extdata/garrido-2024.csv", - package = "MSstatsBioNet" -)) +fpath <- system.file("extdata", "garrido-2024.csv", + package = "MSstatsBioNet", mustWork = TRUE) +input <- data.table::fread(fpath) head(input)
50-52: Copyedit: standardize entity names (UniProt, HGNC).Small wording/casing fix.
-In the below example, we convert uniprot IDs to their corresponding Hgnc IDs. We -can also extract other information, such as hgnc gene name and protein function. +In the example below, we convert UniProt accessions to their corresponding HGNC IDs. We +can also extract other information, such as HGNC gene symbol and protein function.
61-64: Copyedit: tighten phrasing.-subnetwork of proteins from the INDRA database based on differential abundance -analysis results. This function may help finding off target subnetworks. +subnetwork of proteins from the INDRA database based on differential abundance +analysis results. This function may help find off-target subnetworks.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
inst/extdata/garrido-2024.csvis excluded by!**/*.csv
📒 Files selected for processing (1)
vignettes/PTM-Analysis.Rmd(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: test-coverage
🔇 Additional comments (1)
vignettes/PTM-Analysis.Rmd (1)
53-57: Use parameterproteinIdTypeand the accepted values"Uniprot"|"Uniprot_Mnemonic"annotateProteinInfoFromIndra expects the second arg named
proteinIdType(see R/annotateProteinInfoFromIndra.R) and the implementation checks for"Uniprot"or"Uniprot_Mnemonic"— update the vignette to:
annotated_df <- annotateProteinInfoFromIndra(input, proteinIdType = "Uniprot")
Also harmonize doc/comments that use "UniProt" vs "Uniprot".Likely an incorrect or invalid review comment.
User description
Motivation and Context
We want to add PTM analysis to MSstatsBioNet. This is a particular feature that makes this package very MSstats ecosystem of packages specific and not just for any differential analysis tool that outputs p-values.
Checklist Before Requesting a Review
PR Type
Enhancement, Documentation
Description
Add PTM-aware ID handling and site parsing
Integrate PTM site info into edges/nodes
Show PTM overlap tooltips in HTML viz
Add PTM analysis vignette
Diagram Walkthrough
File Walkthrough
annotateProteinInfoFromIndra.R
PTM-aware Uniprot ID populationR/annotateProteinInfoFromIndra.R
GlobalProteinwhen present for IDsprotein_idsnotProteingetPathwaysFromIndra.R
Robust pathway extraction and ranking tweaksR/getPathwaysFromIndra.R
utils_getSubnetworkFromIndra.R
PTM site parsing and stable node/edge metadataR/utils_getSubnetworkFromIndra.R
Site, normalizeProteinfromGlobalProteinvisualizeNetworksWithHTML.R
HTML visualization with PTM overlap tooltipsR/visualizeNetworksWithHTML.R
PTM-Analysis.Rmd
New PTM analysis vignettevignettes/PTM-Analysis.Rmd
Summary by CodeRabbit
New Features
Refactor
Breaking Changes
Documentation
Chores