Feature ptm analysis by tonywu1999 · Pull Request #56 · Vitek-Lab/MSstatsBioNet

tonywu1999 · 2025-09-03T16:08:00Z

User description

Motivation and Context

We want to add PTM analysis to MSstatsBioNet. This is a particular feature that makes this package very MSstats ecosystem of packages specific and not just for any differential analysis tool that outputs p-values.

Checklist Before Requesting a Review

I have read the MSstats contributing guidelines
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules
Ran styler::style_pkg(transformers = styler::tidyverse_style(indent_by = 4))
Ran devtools::document()

PR Type

Enhancement, Documentation

Description

Add PTM-aware ID handling and site parsing
Integrate PTM site info into edges/nodes
Show PTM overlap tooltips in HTML viz
Add PTM analysis vignette

Diagram Walkthrough

flowchart LR
  Input["PTM groupComparison data"]
  Annotate["annotateProteinInfoFromIndra: PTM-aware ID mapping"]
  Subnet["getSubnetworkFromIndra: PTM site parsing"]
  Edges["Edges include stmt_type + site"]
  Nodes["Nodes include Site column"]
  Viz["HTML visualization with PTM overlap tooltips"]
  Vignette["PTM Analysis vignette"]

  Input -- "GlobalProtein/Protein" --> Annotate
  Annotate -- "HgncId mapping" --> Subnet
  Subnet -- "edgeToMetadata (site)" --> Edges
  Subnet -- "nodes with Site" --> Nodes
  Edges -- "consolidate + overlap calc" --> Viz
  Nodes -- "label/color + Site" --> Viz
  Vignette -- "usage and examples" --> Viz

File Walkthrough

Relevant files

Enhancement

annotateProteinInfoFromIndra.R `PTM-aware Uniprot ID population` R/annotateProteinInfoFromIndra.R Use `GlobalProtein` when present for IDs Map Uniprot from `protein_ids` not `Protein` Support mnemonic mapping with PTM data	+8/-2
getPathwaysFromIndra.R `Robust pathway extraction and ranking tweaks` R/getPathwaysFromIndra.R Safeguard HGNC source selection Adjust stmt target parsing by type Rank by max absolute log2FC across matches	+6/-6
utils_getSubnetworkFromIndra.R `PTM site parsing and stable node/edge metadata` R/utils_getSubnetworkFromIndra.R Validate unique HGNC count for API limit De-duplicate HGNCs in INDRA API call Parse PTM `Site`, normalize `Protein` from `GlobalProtein` Carry PTM site in edges; build nodes with Site	+53/-22
visualizeNetworksWithHTML.R `HTML visualization with PTM overlap tooltips` R/visualizeNetworksWithHTML.R Compute PTM site overlap per consolidated edge Propagate overlap as edge tooltip in HTML Consolidate edges with PTM metadata Enhance HTML/JS with hover tooltips and legend note	+155/-23

Documentation

PTM-Analysis.Rmd `New PTM analysis vignette` vignettes/PTM-Analysis.Rmd Add PTM workflow vignette and dataset usage Show ID conversion, pathways, and subnetwork Document phosphorylation filtering and visualization	+106/-0

Summary by CodeRabbit

New Features
- Web visualization now shows PTM site overlap tooltips on edges.
- Subnetwork outputs include PTM Site details; improved protein ID handling for Uniprot mapping.
Refactor
- More robust input validation/deduplication; enhanced edge/site metadata propagation.
- Some visualization functions now accept an optional nodes parameter.
Breaking Changes
- Removed the getPathwaysFromIndra function.
Documentation
- Added a PTM Analysis vignette.
- Updated main vignette to use browser-based visualization with expanded statement types.
Chores
- Removed an unused dependency and related imports.

coderabbitai · 2025-09-03T16:08:08Z

Caution

Review failed

The pull request is closed.

Walkthrough

Removes the exported getPathwaysFromIndra API and associated docs. Adds canonical GlobalProtein handling in annotation. Extends subnetwork utilities with PTM site parsing, node construction revisions, and input validations. Enhances HTML visualization with PTM-overlap tooltips and updated function signatures. Adds a new PTM vignette, updates the main vignette, and adjusts DESCRIPTION/NAMESPACE.

Changes

Cohort / File(s)	Summary
Protein ID canonicalization `R/annotateProteinInfoFromIndra.R`	Selects/creates GlobalProtein by stripping suffixes from Protein; uses GlobalProtein for Uniprot ID mapping and mnemonic handling; updates assignment logic accordingly.
Remove INDRA pathways API `R/getPathwaysFromIndra.R`, `man/getPathwaysFromIndra.Rd`, `NAMESPACE`	Deletes getPathwaysFromIndra function and Rd; removes export and related imports (MASS::fitdistr, httr::GET).
INDRA subnetwork utils and PTM sites `R/utils_getSubnetworkFromIndra.R`	Counts proteins via unique HGNC; deduplicates HGNC in API groundings; validates force_include_other as namespace:id; derives Site from Protein suffixes; prefers GlobalProtein; parses stmt_json for residue/position into site; adds site column to edges; rebuilds nodes from input and restricts to in-edges.
HTML visualization with PTM overlap `R/visualizeNetworksWithHTML.R`	Adds PTM overlap aggregation; consolidateEdges(edges, nodes = NULL) and createEdgeElements(edges, nodes = NULL) accept nodes and attach ptm_overlap; generateCytoscapeConfig passes nodes; generateJavaScriptCode adds edge hover tooltips; HTML export text mentions PTM info.
Vignettes `vignettes/PTM-Analysis.Rmd`, `vignettes/MSstatsBioNet.Rmd`	New PTM analysis vignette using annotateProteinInfoFromIndra, getSubnetworkFromIndra, previewNetworkInBrowser. Updates main vignette to include statement_types and browser-based visualization with displayLabelType.
Package metadata `DESCRIPTION`, `NAMESPACE`	Removes MASS from Imports; tidyr becomes last entry. Drops importFrom(MASS, fitdistr) and importFrom(httr, GET).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant R_Funcs as R Visualization
  participant Consolidator as consolidateEdges
  participant Creator as createEdgeElements
  participant Config as generateCytoscapeConfig
  participant JS as generateJavaScriptCode
  participant Browser as CytoscapeJS

  User->>R_Funcs: previewNetworkInBrowser(nodes, edges)
  R_Funcs->>Creator: createEdgeElements(edges, nodes)
  Creator->>Consolidator: consolidateEdges(edges, nodes)
  Consolidator->>Consolidator: Group by source-target-interaction<br/>Compute PTM overlap text
  Consolidator-->>Creator: edges + ptm_overlap
  Creator-->>Config: edge elements with tooltip field
  Config-->>JS: Cytoscape config (nodes, edges, tooltips)
  JS->>Browser: Initialize graph and tooltip handlers
  User->>Browser: Hover edge
  Browser-->>User: Show PTM overlap tooltip (if non-empty)

sequenceDiagram
  autonumber
  participant Input as Annotated Input
  participant Utils as getSubnetworkFromIndra (utils)
  participant API as INDRA API
  participant Parser as JSON Parser
  participant Builder as Node/Edge Builder

  Input->>Utils: annotated_df (Protein/GlobalProtein, HgncId, log2FC, adj.pvalue)
  Utils->>Utils: Validate protein count (unique HGNC)
  Utils->>API: Query with unique HGNC + force_include_other (ns:id)
  API-->>Utils: Statements (stmt_json, refs)
  Utils->>Parser: Extract residue/position from stmt_json
  Parser-->>Utils: site (e.g., Y1110)
  Utils->>Builder: Map edges (add site), map nodes from input, restrict to in-edges
  Builder-->>Input: nodes, edges (with site)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Feature ptm analysis #56 — Mirrors these PTM-aware annotation, subnetwork, and visualization changes, including the removal of getPathwaysFromIndra.
Force drug #54 — Refactors INDRA subnetwork utilities and metadata handling, overlapping with the node/edge reconstruction and force_include_other logic.
feature(visualizeNetworksWithHTML): Add INDRA link into javascript visualization #57 — Modifies R/visualizeNetworksWithHTML.R to add new edge fields, closely related to introducing ptm_overlap and tooltip behavior.

Poem

I nibble code like clover tips,
Mapping sites on protein scripts—
Hover, see the PTMs glow,
Edges whisper what they know.
Old paths pruned, new trails spun,
Networks bloom—hop, hop—well done!
(_/)* 🥕

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60196fd and 5c5d1db.

📒 Files selected for processing (5)

R/annotateProteinInfoFromIndra.R (1 hunks)
R/utils_getSubnetworkFromIndra.R (10 hunks)
R/visualizeNetworksWithHTML.R (17 hunks)
vignettes/MSstatsBioNet.Rmd (2 hunks)
vignettes/PTM-Analysis.Rmd (1 hunks)

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature-ptm-analysis

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-09-03T16:09:03Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue Node construction now subsets input directly and then filters by edge endpoints; if multiple rows share the same Protein (e.g., multiple PTM Sites), this may drop needed nodes or produce duplicates depending on upstream filtering. Confirm this aligns with expected node cardinality and that attributes (logFC, adj.pvalue, Site) map correctly after PTM handling and GlobalProtein substitution. nodes = input[, c("Protein", "log2FC", "adj.pvalue", "HgncName", "Site")] colnames(nodes) = c("id", "logFC", "adj.pvalue", "hgncName", "Site") nodes = nodes[nodes$id %in% c(edges$source, edges$target), ] nodes$hgncName = ifelse(is.na(nodes$hgncName), nodes$id, nodes$hgncName) # # # Get unique nodes from edges # node_ids <- unique(c(edges$source, edges$target)) # # # Create base nodes dataframe # nodes <- data.frame( # id = node_ids, # stringsAsFactors = FALSE # ) # # # Add attributes from input where available # nodes$logFC <- input$log2FC[match(nodes$id, input$Protein)] # nodes$adj.pvalue <- input$adj.pvalue[match(nodes$id, input$Protein)] # nodes$hgncName <- if ("HgncName" %in% colnames(input) && is.character(input$HgncName)) { # hgnc_value <- input$HgncName[match(nodes$id, input$Protein)] # ifelse(is.na(hgnc_value), nodes$id, hgnc_value) # } else { # nodes$id # } # nodes$site <- input$Site[match(nodes$id, input$Protein)] return(nodes) Edge Target Selection For non-complex statement types, branch logic was broadened; ensure that for types outside Activation/Inhibition/IncreaseAmount/DecreaseAmount, accessing edge$sub is always valid. Otherwise, missing sub may cause errors or drop edges. Validate against INDRA payload variations. } else if (edge$type %in% c("Activation", "Inhibition", "IncreaseAmount", "DecreaseAmount")) { obj = edge$obj$db_refs$HGNC namespaces = names(edge$obj$db_refs) } else { obj = edge$sub$db_refs$HGNC namespaces = names(edge$sub$db_refs) } # Filter out edges with no HGNC ID or not in the dataset PTM Site Parsing Site extraction uses a simple pattern "_[A-Z][0-9]" and gsub with lookahead; multi-site formats or different delimiters may not parse as intended. Verify it handles cases like multiple sites, lowercase residues, and proteins containing underscores unrelated to sites. input$Site = ifelse(grepl("_[A-Z][0-9]", input$Protein), gsub(".*?_(?=[A-Z][0-9])", "", input$Protein, perl = TRUE), NA_character_) if ("GlobalProtein" %in% colnames(input)) { input$Protein = input$GlobalProtein }

github-actions · 2025-09-03T16:09:40Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Safely select target logFC Guard against empty matches and all-NA `log2FC` to avoid indexing errors. Provide a safe default (e.g., 0) or skip probability calculation when no valid `log2FC` exists. R/getPathwaysFromIndra.R [122-124] -logFC = annotated_df[which(annotated_df$HgncId == edgeToMetadataMapping[[key]]$target_id),] -logFC = logFC$log2FC[which.max(abs(logFC$log2FC))] +logFC_row <- annotated_df[annotated_df$HgncId == edgeToMetadataMapping[[key]]$target_id, ] +if (nrow(logFC_row) == 0 \|\| all(is.na(logFC_row$log2FC))) { + logFC <- 0 +} else { + idx <- which.max(abs(logFC_row$log2FC)) + if (length(idx) == 0 \|\| is.na(idx)) { + logFC <- 0 + } else { + logFC <- logFC_row$log2FC[idx] + } +} Suggestion importance[1-10]: 7 __ Why: Adds defensive checks for empty matches and all-NA `log2FC`, preventing indexing errors and making probability calc robust; aligns with the new selection line at 123.	Medium
Possible issue	Escape tooltip content safely Escape backslashes and newlines in `ptm_overlap` to avoid invalid JS strings and broken HTML. Apply a more comprehensive sanitization before embedding in the JSON-like string. R/visualizeNetworksWithHTML.R [342-355] -tooltip_text <- gsub("'", "\\\\'", row$ptm_overlap) -# Create edge data with styling information and PTM overlap tooltip +tooltip_text <- row$ptm_overlap +tooltip_text <- gsub("\\\\", "\\\\\\\\", tooltip_text) # escape backslashes +tooltip_text <- gsub("'", "\\\\'", tooltip_text) # escape single quotes +tooltip_text <- gsub("\n", "\\\\n", tooltip_text) # escape newlines +tooltip_text <- gsub("\r", "", tooltip_text) # remove CR edge_data <- paste0("{ data: { source: '", row$source, "', target: '", row$target, "', id: '", edge_key, "', interaction: '", row$interaction, "', edge_type: '", row$edge_type, "', category: '", row$category, "', color: '", style$color, "', line_style: '", style$style, "', arrow_shape: '", style$arrow, "', width: ", style$width, ", tooltip: '", tooltip_text, "' } }") Suggestion importance[1-10]: 6 __ Why: Further sanitizes tooltip strings (backslashes/newlines) to avoid malformed JS; complements existing single-quote escaping and reduces risk of runtime issues.	Low
General	Robust PTM site parsing Ensure `Site` extraction only runs when `Protein` is character and contains an underscore-delimited site; otherwise `grepl`/`gsub` can error. Normalize `Protein` to character before parsing and keep `Site` aligned if `GlobalProtein` overwrites `Protein`. R/utils_getSubnetworkFromIndra.R [148-153] -input$Site = ifelse(grepl("_[A-Z][0-9]", input$Protein), - gsub(".?_(?=[A-Z][0-9])", "", input$Protein, perl = TRUE), - NA_character_) +input$Protein <- as.character(input$Protein) if ("GlobalProtein" %in% colnames(input)) { - input$Protein = input$GlobalProtein + input$GlobalProtein <- as.character(input$GlobalProtein) +} +has_site <- !is.na(input$Protein) & grepl("_[A-Z][0-9]+$", input$Protein) +input$Site <- ifelse(has_site, + gsub(".?_(?=[A-Z][0-9]+$)", "", input$Protein, perl = TRUE), + NA_character_) +if ("GlobalProtein" %in% colnames(input)) { + # Preserve parsed Site while replacing Protein with global protein id + input$Protein <- input$GlobalProtein } Suggestion importance[1-10]: 6 __ Why: Normalizes types and tightens regex, and preserves `Site` when replacing `Protein`; matches the PTM handling block and improves resilience with modest impact.	Low

codecov-commenter · 2025-09-03T16:14:00Z

Codecov Report

❌ Patch coverage is 81.48148% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.30%. Comparing base (6330e22) to head (5c5d1db).

Files with missing lines	Patch %	Lines
R/visualizeNetworksWithHTML.R	78.94%	20 Missing ⚠️
R/utils_getSubnetworkFromIndra.R	86.66%	4 Missing ⚠️
R/annotateProteinInfoFromIndra.R	90.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            devel      #56      +/-   ##
==========================================
+ Coverage   57.36%   64.30%   +6.94%     
==========================================
  Files           8        7       -1     
  Lines        1222     1216       -6     
==========================================
+ Hits          701      782      +81     
+ Misses        521      434      -87

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)

R/annotateProteinInfoFromIndra.R (1)

70-79: Update Protein column reference to use protein_ids consistently.

In the Uniprot_Mnemonic path, the code still references df$Protein when setting UniprotId values, but should use the index based on the original protein_ids source.

Fix the inconsistent column reference:

         if (proteinIdType == "Uniprot_Mnemonic") {
                 mnemonicProteins <- protein_ids
                 if (length(mnemonicProteins) > 0) {
                         uniprotMapping <- .callGetUniprotIdsFromUniprotMnemonicIdsApi(as.list(mnemonicProteins))
                         for (mnemonicId in names(uniprotMapping)) {
                                 if (!is.null(uniprotMapping[[mnemonicId]])) {
-                                        df$UniprotId[df$Protein == mnemonicId] <- uniprotMapping[[mnemonicId]]
+                                        df$UniprotId[protein_ids == mnemonicId] <- uniprotMapping[[mnemonicId]]
                                 }
                         }
                 }
         }

R/getPathwaysFromIndra.R (7)

29-33: Harden normal fit: handle NA/Inf and small-N failures.

fitdistr will error if log2FC has NA/Inf or too few finite values. Add filtering and a safe fallback.

Apply:

-    log2fc_values <- annotated_df$log2FC
-    fit <- fitdistr(log2fc_values, "normal")
-    para <- fit$estimate
+    log2fc_values <- annotated_df$log2FC
+    log2fc_values <- log2fc_values[is.finite(log2fc_values)]
+    if (length(log2fc_values) >= 2) {
+        fit <- fitdistr(log2fc_values, "normal")
+        para <- fit$estimate
+    } else {
+        # Fallback to standard normal when insufficient data
+        para <- c(mean = 0, sd = 1)
+    }

28-28: Ensure consistent types for HGNC IDs.

Later comparisons rely on character equality. Coerce once to avoid subtle mismatches.

Apply:

     annotated_df$Protein <- as.character(annotated_df$Protein)
+    annotated_df$HgncId  <- as.character(annotated_df$HgncId)

53-57: Build URL via query params and handle HTTP errors.

Avoid manual string paste and add basic HTTP error handling.

Apply:

-    url = paste('https://db.indra.bio/statements/from_agents?subject=',
-                source_id, namespace, sep = "")
-    response <- GET(url)
-    z = content(response)
+    response <- GET(
+        "https://db.indra.bio/statements/from_agents",
+        query = list(subject = paste0(source_id, namespace))
+    )
+    if (response$status_code >= 300) {
+        stop(sprintf("INDRA request failed (HTTP %s)", response$status_code))
+    }
+    z <- content(response)

66-77: Subject matching in Complex edges: avoid identical() type pitfalls.

Coerce both sides to character; identical() will fail on numeric vs character.

Apply:

-                if (identical(edge$members[[1]]$db_refs[[id_field]], source_id)) {
+                if (as.character(edge$members[[1]]$db_refs[[id_field]]) == source_id) {
                     obj = edge$members[[2]]$db_refs$HGNC
                     namespaces = names(edge$members[[2]]$db_refs)
                 } else {
                     obj = edge$members[[1]]$db_refs$HGNC
                     namespaces = names(edge$members[[1]]$db_refs)
                 }

86-91: HGNC filter: be explicit about types and missing values.

Make membership checks robust to NA/NULL and type coercion.

Apply:

-        if (!("HGNC" %in% namespaces)) {
+        if (is.null(namespaces) || !("HGNC" %in% namespaces)) {
             next
-        } else if (!(obj %in% annotated_df$HgncId)) {
+        } else if (is.null(obj) || is.na(obj) || !(as.character(obj) %in% annotated_df$HgncId)) {
             next
         }

121-132: Probability calc: NA-safe logFC, avoid log10(0), clamp to [0,1].

Prevents runtime errors and keeps probabilities in range.

Apply:

-        prob_logFC = 0
-        logFC = annotated_df[which(annotated_df$HgncId == edgeToMetadataMapping[[key]]$target_id),]
-        logFC = logFC$log2FC[which.max(abs(logFC$log2FC))]
+        prob_logFC <- 0
+        logFC_rows <- annotated_df[annotated_df$HgncId == edgeToMetadataMapping[[key]]$target_id, , drop = FALSE]
+        vals <- logFC_rows$log2FC
+        vals <- vals[is.finite(vals)]
+        logFC <- if (length(vals)) vals[which.max(abs(vals))] else 0
         if (logFC > para[1]) {
             prob_logFC = 1 - pnorm(logFC, mean = para[1], sd = para[2])
         } else {
             prob_logFC = pnorm(logFC, mean = para[1], sd = para[2])
         }
-        evidence_prob = 10^(m*log10(edgeToMetadataMapping[[key]]$data$evidence_count)+b)
-        edgeToMetadataMapping[[key]]$data$total_prob = 1 - ((1 - prob_logFC) * (1 - evidence_prob))
+        cnt <- max(1, as.numeric(edgeToMetadataMapping[[key]]$data$evidence_count))
+        evidence_prob <- 10^(m * log10(cnt) + b)
+        evidence_prob <- max(0, min(1, evidence_prob))
+        total_prob <- 1 - ((1 - prob_logFC) * (1 - evidence_prob))
+        edgeToMetadataMapping[[key]]$data$total_prob <- max(0, min(1, total_prob))
-        edgeToMetadataMapping[[key]]$data$logFC = logFC
+        edgeToMetadataMapping[[key]]$data$logFC <- logFC

160-175: Align added node schema with .constructNodesDataFrame

Replace pvalue = 0 with adj.pvalue = 0 when appending the main target node:

-                logFC = 0,
-                pvalue = 0,
+                logFC = 0,
+                adj.pvalue = 0,

Optionally, verify that main_target uses the same identifier namespace (UniProt) as nodes$id, or map it before binding to avoid orphaned nodes.

🧹 Nitpick comments (7)

R/utils_getSubnetworkFromIndra.R (1)

289-314: Clean up commented-out code.

There's a large block of commented-out code that should be removed if it's no longer needed. This impacts code readability and maintenance.

Remove the commented-out code block from lines 294-313 since the new implementation is working correctly.
R/visualizeNetworksWithHTML.R (1)
566-567: Simplify the tooltip display condition.

The condition checking for "No overlapping PTM sites found" is unnecessary since the new implementation returns an empty string for no overlaps.

Simplify the condition:
-        if (tooltipText && tooltipText.trim() !== '' && tooltipText.trim() !== 'No overlapping PTM sites found') {
+        if (tooltipText && tooltipText.trim() !== '') {
R/annotateProteinInfoFromIndra.R (1)

58-58: Address the TODO comment about PTM dataset handling.

The TODO comment indicates that PTM dataset handling needs improvement. This should be addressed or tracked in an issue.

Would you like me to help create a more robust PTM dataset handling implementation or open an issue to track this technical debt?
R/getPathwaysFromIndra.R (1)
135-158: Nit: use explicit FUN.VALUE types in vapply.

Improve type safety by using character(1)/numeric(1).

Apply:
-        }, ""),
+        }, character(1)),
...
-        }, ""),
+        }, character(1)),
...
-        }, 1),
+        }, numeric(1)),
...
-        }, 1),
+        }, numeric(1)),
...
-        }, 1),
+        }, numeric(1)),
...
-        }, ""),
+        }, character(1)),
vignettes/PTM-Analysis.Rmd (3)
31-33: Verify install branch name.

The PR branch is “feature-ptm-analysis”, but the install command uses “feature-ptm”. Update if that’s unintentional.

Apply:
-devtools::install_github("Vitek-Lab/MSstatsBioNet@feature-ptm", build_vignettes = TRUE)
+devtools::install_github("Vitek-Lab/MSstatsBioNet@feature-ptm-analysis", build_vignettes = TRUE)
72-78: Set expectations conservatively.

“GAB1 is rated as the most relevant edge” may not always hold with user data/filters. Phrase as an example or remove.

Apply:
-Here, we will get pathways from INDRA and rank them based on relevance.  We will see here that GAB1 is rated as the most relevant edge to track.
+Here, we will get pathways from INDRA and rank them based on relevance. Results will vary by dataset and filters.
24-33: Keep install chunks non-evaluating.

Good use of eval = FALSE. Consider also prefacing with a note that installs should be run in a clean R session, not during vignette build.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f41191a and 93d7401.

⛔ Files ignored due to path filters (1)

inst/extdata/2_egf_data.csv is excluded by !**/*.csv

📒 Files selected for processing (5)

R/annotateProteinInfoFromIndra.R (1 hunks)
R/getPathwaysFromIndra.R (3 hunks)
R/utils_getSubnetworkFromIndra.R (9 hunks)
R/visualizeNetworksWithHTML.R (19 hunks)
vignettes/PTM-Analysis.Rmd (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: pkgdown
GitHub Check: Run pr agent on every pull request
GitHub Check: test-coverage
GitHub Check: build

🔇 Additional comments (2)

R/utils_getSubnetworkFromIndra.R (1)
148-153: Fix the regex pattern for PTM site extraction.

The current regex pattern ".*?_(?=[A-Z][0-9])" will fail to correctly extract PTM sites in several cases:

It doesn't properly handle multi-digit positions (e.g., _S123)

The replacement pattern removes everything up to the underscore, but should only remove the protein name prefix

Apply this diff to fix the PTM site extraction:
-    input$Site = ifelse(grepl("_[A-Z][0-9]", input$Protein),
-                        gsub(".*?_(?=[A-Z][0-9])", "", input$Protein, perl = TRUE),
+    input$Site = ifelse(grepl("_[A-Z][0-9]+", input$Protein),
+                        gsub("^[^_]+_", "", input$Protein),
                         NA_character_)
Likely an incorrect or invalid review comment.
R/visualizeNetworksWithHTML.R (1)

960-963: LGTM! Clear PTM visualization documentation.

The addition of PTM site overlap information in the legend and instructions is well-implemented and provides clear guidance to users about the hover functionality.

coderabbitai · 2025-09-03T16:19:53Z

+calculatePTMOverlapAggregated <- function(edges, nodes) {
+    if (nrow(edges) == 0) return(character(0))
+
+    # Group edges by source-target-interaction to match consolidation logic
+    edges$edge_key <- paste(edges$source, edges$target, edges$interaction, sep = "-")
+    unique_edges <- unique(edges$edge_key)
+
+    overlap_info <- character(length(unique_edges))
+    names(overlap_info) <- unique_edges
+
+    for (edge_key in unique_edges) {
+        # Get all edges with this source-target-interaction combination
+        matching_edges <- edges[edges$edge_key == edge_key, ]
+        all_overlap_sites <- c()
+
+        # Process each matching edge to find PTM overlaps
+        for (i in 1:nrow(matching_edges)) {
+            edge <- matching_edges[i, ]
+
+            # Check if edge has target and site information
+            if (!is.na(edge$target) && "site" %in% names(edge) && !is.na(edge$site)) {
+                # Find matching nodes with the same target ID
+                target_nodes <- nodes[nodes$id == edge$target, ]
+
+                if (nrow(target_nodes) > 0 && "Site" %in% names(target_nodes)) {
+                    edge_sites <- trimws(unlist(strsplit(as.character(edge$site), "[,;|]")))
+
+                    # Check each target node row for site matches
+                    for (j in 1:nrow(target_nodes)) {
+                        if (!is.na(target_nodes$Site[j])) {
+                            node_sites <- trimws(unlist(strsplit(as.character(target_nodes$Site[j]), "[,;|]")))
+
+                            # Find overlapping sites for this edge-node combination
+                            overlap_sites <- intersect(edge_sites, node_sites)
+                            overlap_sites <- overlap_sites[overlap_sites != "" & !is.na(overlap_sites)]
+
+                            # Add to the aggregate list
+                            all_overlap_sites <- c(all_overlap_sites, overlap_sites)
+                        }
+                    }
+                }
+            }
+        }
+
+        # Remove duplicates and create tooltip text for this consolidated edge
+        unique_overlap_sites <- unique(all_overlap_sites)
+        unique_overlap_sites <- unique_overlap_sites[unique_overlap_sites != "" & !is.na(unique_overlap_sites)]
+
+        # CHANGED: Only create tooltip text if there are actual overlapping sites
+        if (length(unique_overlap_sites) > 0) {
+            if (length(unique_overlap_sites) == 1) {
+                overlap_info[edge_key] <- paste0("Overlapping PTM site: ", unique_overlap_sites[1])
+            } else {
+                overlap_info[edge_key] <- paste0("Overlapping PTM sites: ", paste(unique_overlap_sites, collapse = ", "))
+            }
+        } else {
+            # CHANGED: Return empty string instead of "No overlapping PTM sites found"
+            overlap_info[edge_key] <- ""
+        }
+    }
+
+    return(overlap_info)
+}


🛠️ Refactor suggestion

Improve PTM overlap calculation performance and robustness.

The PTM overlap calculation has several issues:

The triple nested loop (edges × target_nodes × sites) could be inefficient for large datasets

Site splitting using multiple delimiters [,;|] should be documented

No validation that the Site column exists in nodes before accessing it

Consider these improvements:

calculatePTMOverlapAggregated <- function(edges, nodes) { if (nrow(edges) == 0) return(character(0)) + + # Validate required columns + if (!"Site" %in% names(nodes)) { + return(character(nrow(edges))) # Return empty overlaps if no Site info + } # Group edges by source-target-interaction to match consolidation logic edges$edge_key <- paste(edges$source, edges$target, edges$interaction, sep = "-")

Also consider pre-processing the node sites into a lookup table to improve performance.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

R/visualizeNetworksWithHTML.R around lines 82-144: the PTM overlap routine is slow and fragile due to triple nested loops, lack of Site column validation, and undocumented delimiter use; fix by (1) early-validate that nodes has a "Site" column and gracefully handle missing/NA values; (2) pre-process nodes into a lookup list mapping node id -> unique vector of trimmed sites (split once using documented delimiter pattern "[,;|]"), removing empty/NA entries; (3) compute overlaps per consolidated edge by splitting the edge site string once and intersecting with the precomputed node-site vectors (use vectorized set operations rather than looping per target node row), aggregating unique overlaps and producing the same tooltip string logic (empty string when none); and (4) add a short comment noting the delimiter semantics and why the lookup improves performance.

coderabbitai · 2025-09-03T16:19:53Z

+
+        # Create edge data with styling information and PTM overlap tooltip


⚠️ Potential issue

Escape quotes more robustly for JavaScript safety.

The current quote escaping only handles single quotes but should also handle double quotes and other special characters that could break JavaScript.

Use a more robust escaping approach:

- # Escape quotes in tooltip text for JavaScript safety - tooltip_text <- gsub("'", "\\\\'", row$ptm_overlap) + # Escape special characters for JavaScript safety + tooltip_text <- gsub("(['\"\\\\])", "\\\\\\1", row$ptm_overlap)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tooltip_text <- gsub("'", "\\\\'", row$ptm_overlap)

# Escape special characters for JavaScript safety

tooltip_text <- gsub("(['\"\\\\])", "\\\\\\1", row$ptm_overlap)

coderabbitai · 2025-09-03T16:19:53Z

+    tooltip.style.cssText = `
+        position: absolute;
+        background-color: rgba(0, 0, 0, 0.9);
+        color: white;
+        padding: 8px 12px;
+        border-radius: 4px;
+        font-size: 12px;
+        font-family: Arial, sans-serif;
+        white-space: nowrap;
+        pointer-events: none;
+        z-index: 9999;
+        box-shadow: 0 2px 8px rgba(0, 0, 0, 0.3);
+        display: none;
+        max-width: 300px;
+        word-wrap: break-word;
+        white-space: pre-wrap;
+    `;
+    document.body.appendChild(tooltip);
+


🛠️ Refactor suggestion

Add null checks for tooltip element creation.

The tooltip creation code assumes document.body exists when the script runs, which might not always be the case if the script executes before DOM is ready.

Wrap tooltip creation in a DOM ready check:

+ // Ensure DOM is ready before creating tooltip + if (document.body) { var tooltip = document.createElement('div'); tooltip.style.cssText = \`...\`; document.body.appendChild(tooltip); + } else { + console.error('Document body not ready for tooltip creation'); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

var tooltip = document.createElement('div');

tooltip.style.cssText = `

position: absolute;

background-color: rgba(0, 0, 0, 0.9);

color: white;

padding: 8px 12px;

border-radius: 4px;

font-size: 12px;

font-family: Arial, sans-serif;

white-space: nowrap;

pointer-events: none;

z-index: 9999;

box-shadow: 0 2px 8px rgba(0, 0, 0, 0.3);

display: none;

max-width: 300px;

word-wrap: break-word;

white-space: pre-wrap;

`;

document.body.appendChild(tooltip);

// Ensure DOM is ready before creating tooltip

if (document.body) {

var tooltip = document.createElement('div');

tooltip.style.cssText = `

position: absolute;

background-color: rgba(0, 0, 0, 0.9);

color: white;

padding: 8px 12px;

border-radius: 4px;

font-size: 12px;

font-family: Arial, sans-serif;

white-space: nowrap;

pointer-events: none;

z-index: 9999;

box-shadow: 0 2px 8px rgba(0, 0, 0, 0.3);

display: none;

max-width: 300px;

word-wrap: break-word;

white-space: pre-wrap;

`;

document.body.appendChild(tooltip);

} else {

console.error('Document body not ready for tooltip creation');

}

🤖 Prompt for AI Agents

In R/visualizeNetworksWithHTML.R around lines 542 to 560, the code creates and appends a tooltip element assuming document.body exists; change this to wait for DOM readiness by checking document.readyState and if not 'loading' create and append the tooltip immediately, otherwise attach a DOMContentLoaded listener to create/append the tooltip when the DOM is ready; ensure the tooltip variable is created in the same scope so event handlers can access it and guard against duplicate creation if the listener is invoked multiple times.

shiny needs fixing to refer to global protein

… info in subnetwork function

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

R/utils_getSubnetworkFromIndra.R (2)

59-66: Add HTTP timeouts and error handling for INDRA API calls.

Network calls lack timeouts and status checks; failures can hang or silently proceed with bad responses.

-    res <- POST(
+    res <- POST(
         indraCogexUrl,
         body = groundings,
         add_headers("Content-Type" = "application/json"),
-        encode = "raw"
+        encode = "raw",
+        httr::timeout(60)
     )
-    res <- content(res)
+    if (httr::http_error(res)) {
+        stop(sprintf("INDRA API request failed (HTTP %s): %s",
+                     httr::status_code(res), httr::content(res, as = "text")))
+    }
+    res <- content(res)

110-123: Normalize Protein and derive Site before exemptions/filters; avoid factor coercion.

As written, exempt_proteins are captured before Site/GlobalProtein normalization, so forced proteins may lose Site and dedup may keep stale rows. Also, assigning GlobalProtein can coerce Protein back to factor.

 .filterGetSubnetworkFromIndraInput <- function(input, pvalueCutoff, logfc_cutoff, force_include_proteins) {
-    # Extract exempt proteins before any filtering
+    # Normalize Protein and derive Site before exemptions/filters
+    prot_raw = as.character(input$Protein)
+    input$Site = ifelse(grepl("_[A-Za-z][0-9]", prot_raw),
+                        sub(".*?_(?=[A-Za-z][0-9])", "", prot_raw, perl = TRUE),
+                        NA_character_)
+    if ("GlobalProtein" %in% colnames(input)) {
+        input$Protein = as.character(input$GlobalProtein)
+    } else {
+        input$Protein = prot_raw
+    }
+
+    # Extract exempt proteins after normalization
     exempt_proteins <- NULL
     if (!is.null(force_include_proteins)) {
         if (!is.character(force_include_proteins)) {
             stop("force_include_proteins must be a character vector")
         }
         missing_prots <- setdiff(force_include_proteins, input$Protein)
         if (length(missing_prots) > 0) {
             warning("force_include_proteins not found: ", paste(missing_prots, collapse = ", "))
         }
         exempt_proteins <- input[input$Protein %in% force_include_proteins,]
     }
-    
-    # Apply standard filtering
+    # Apply standard filtering
     input <- input[!is.na(input$adj.pvalue),]
@@
-    input$Protein <- as.character(input$Protein)
-    
-    # Handle PTMs in Protein column
-    input$Site = ifelse(grepl("_[A-Z][0-9]", input$Protein),
-                        gsub(".*?_(?=[A-Z][0-9])", "", input$Protein, perl = TRUE),
-                        NA_character_)
-    if ("GlobalProtein" %in% colnames(input)) {
-        input$Protein = input$GlobalProtein
-    }

Also applies to: 145-153

♻️ Duplicate comments (3)

R/utils_getSubnetworkFromIndra.R (3)
45-46: Good: deduplicate HGNC IDs before building groundings.

This addresses earlier feedback about deduplication ordering.

172-177: Guard against missing Protein column in metadata mapping.

Accessing input$Protein without validation can error; add a check.
 .addAdditionalMetadataToIndraEdge <- function(edge, input) {
+    if (!"Protein" %in% colnames(input)) {
+        stop("Input must contain 'Protein' column for edge metadata")
+    }
@@
-    uniprot_ids_source <- unique(matched_rows_source$Protein)
+    uniprot_ids_source <- unique(matched_rows_source$Protein)
@@
-    uniprot_ids_target = unique(matched_rows_target$Protein)
+    uniprot_ids_target = unique(matched_rows_target$Protein)
Also applies to: 180-185

13-14: Fix 400-protein check to only count unique HGNC IDs (including HGNC in force_include_other).

Currently all force_include_other entries are counted, even non-HGNC namespaces and duplicates, which can wrongly reject valid queries.
-    num_proteins = length(unique(input$HgncId)) + 
-        ifelse(!is.null(force_include_other), length(force_include_other), 0)
+    unique_hgnc_ids = unique(input$HgncId)
+    if (!is.null(force_include_other)) {
+        hgnc_extra = vapply(strsplit(force_include_other, ":"), function(parts) {
+            if (length(parts) == 2 && toupper(parts[1]) == "HGNC") parts[2] else NA_character_
+        }, NA_character_)
+        unique_hgnc_ids = union(unique_hgnc_ids, unique(stats::na.omit(hgnc_extra)))
+    }
+    num_proteins = length(unique_hgnc_ids)

🧹 Nitpick comments (1)

R/utils_getSubnetworkFromIndra.R (1)
260-262: Preserve NA for site column.

Using "" as FUN.VALUE can coerce NA to empty string in some flows; prefer NA_character_.
-        site = vapply(keys(res), function(x) {
-            query(res, x)$site
-        }, ""),
+        site = vapply(keys(res), function(x) {
+            query(res, x)$site
+        }, NA_character_),

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 93d7401 and c206517.

⛔ Files ignored due to path filters (1)

inst/extdata/2_egf_data.csv is excluded by !**/*.csv

📒 Files selected for processing (5)

R/annotateProteinInfoFromIndra.R (1 hunks)
R/getPathwaysFromIndra.R (3 hunks)
R/utils_getSubnetworkFromIndra.R (9 hunks)
R/visualizeNetworksWithHTML.R (19 hunks)
vignettes/PTM-Analysis.Rmd (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

R/visualizeNetworksWithHTML.R
vignettes/PTM-Analysis.Rmd
R/annotateProteinInfoFromIndra.R
R/getPathwaysFromIndra.R

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build
GitHub Check: test-coverage
GitHub Check: pkgdown

🔇 Additional comments (2)

R/utils_getSubnetworkFromIndra.R (2)

194-194: Import alignment looks good.

fromJSON is used below; roxygen import is appropriate.

232-233: Import alignment for fromJSON is consistent.

Matches usage below in edges construction.

coderabbitai · 2025-09-11T17:51:48Z

+        json_object <- fromJSON(edge$data$stmt_json)
+        if (!is.null(json_object$residue) && !is.null(json_object$position)) {
+            edge$site = paste0(json_object$residue, json_object$position)
+            key <- paste(key, edge$site, sep = "_")
+        } else {
+            edge$site = NA_character_
+        }


🛠️ Refactor suggestion

Harden stmt_json parsing.

fromJSON can throw on malformed JSON; use tryCatch to avoid dropping the whole run.

- json_object <- fromJSON(edge$data$stmt_json) - if (!is.null(json_object$residue) && !is.null(json_object$position)) { + json_object <- tryCatch(fromJSON(edge$data$stmt_json), error = function(e) NULL) + if (!is.null(json_object) && !is.null(json_object$residue) && !is.null(json_object$position)) { edge$site = paste0(json_object$residue, json_object$position) key <- paste(key, edge$site, sep = "_") } else { edge$site = NA_character_ }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

json_object <- fromJSON(edge$data$stmt_json)

if (!is.null(json_object$residue) && !is.null(json_object$position)) {

edge$site = paste0(json_object$residue, json_object$position)

key <- paste(key, edge$site, sep = "_")

} else {

edge$site = NA_character_

}

json_object <- tryCatch(fromJSON(edge$data$stmt_json), error = function(e) NULL)

if (!is.null(json_object) && !is.null(json_object$residue) && !is.null(json_object$position)) {

edge$site = paste0(json_object$residue, json_object$position)

key <- paste(key, edge$site, sep = "_")

} else {

edge$site = NA_character_

}

🤖 Prompt for AI Agents

In R/utils_getSubnetworkFromIndra.R around lines 204-210, fromJSON(edge$data$stmt_json) can throw on malformed JSON and crash the run; wrap the fromJSON call in tryCatch to safely handle parsing errors (returning NULL or an empty list on error), and when an error occurs set edge$site <- NA_character_ (and optionally log a warning including the edge id or key). Ensure the subsequent logic that sets key and pastes site only runs when json_object was successfully parsed and residue/position are present.

coderabbitai · 2025-09-11T17:51:49Z

+    nodes = input[, c("Protein", "log2FC", "adj.pvalue", "HgncName", "Site")]
+    colnames(nodes) = c("id", "logFC", "adj.pvalue", "hgncName", "Site")

-    # Add attributes from input where available
-    nodes$logFC <- input$log2FC[match(nodes$id, input$Protein)]
-    nodes$adj.pvalue <- input$adj.pvalue[match(nodes$id, input$Protein)]
-    nodes$hgncName <- if ("HgncName" %in% colnames(input) && is.character(input$HgncName)) {
-        hgnc_value <- input$HgncName[match(nodes$id, input$Protein)]
-        ifelse(is.na(hgnc_value), nodes$id, hgnc_value)
-    } else {
-        nodes$id
-    }
+    nodes = nodes[nodes$id %in% c(edges$source, edges$target), ]
+    nodes$hgncName = ifelse(is.na(nodes$hgncName), nodes$id, nodes$hgncName)
+    # 


🛠️ Refactor suggestion

Make nodes construction resilient to missing columns.

Selecting HgncName/Site without existence checks can error; precreate when absent.

.constructNodesDataFrame <- function(input, edges) { + if (!"HgncName" %in% colnames(input)) input$HgncName <- NA_character_ + if (!"Site" %in% colnames(input)) input$Site <- NA_character_ nodes = input[, c("Protein", "log2FC", "adj.pvalue", "HgncName", "Site")] colnames(nodes) = c("id", "logFC", "adj.pvalue", "hgncName", "Site") nodes = nodes[nodes$id %in% c(edges$source, edges$target), ] nodes$hgncName = ifelse(is.na(nodes$hgncName), nodes$id, nodes$hgncName)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

nodes = input[, c("Protein", "log2FC", "adj.pvalue", "HgncName", "Site")]

colnames(nodes) = c("id", "logFC", "adj.pvalue", "hgncName", "Site")

# Add attributes from input where available

nodes$logFC <- input$log2FC[match(nodes$id, input$Protein)]

nodes$adj.pvalue <- input$adj.pvalue[match(nodes$id, input$Protein)]

nodes$hgncName <- if ("HgncName" %in% colnames(input) && is.character(input$HgncName)) {

hgnc_value <- input$HgncName[match(nodes$id, input$Protein)]

ifelse(is.na(hgnc_value), nodes$id, hgnc_value)

} else {

nodes$id

}

nodes = nodes[nodes$id %in% c(edges$source, edges$target), ]

nodes$hgncName = ifelse(is.na(nodes$hgncName), nodes$id, nodes$hgncName)

#

if (!"HgncName" %in% colnames(input)) input$HgncName <- NA_character_

if (!"Site" %in% colnames(input)) input$Site <- NA_character_

nodes = input[, c("Protein", "log2FC", "adj.pvalue", "HgncName", "Site")]

colnames(nodes) = c("id", "logFC", "adj.pvalue", "hgncName", "Site")

nodes = nodes[nodes$id %in% c(edges$source, edges$target), ]

nodes$hgncName = ifelse(is.na(nodes$hgncName), nodes$id, nodes$hgncName)

#

🤖 Prompt for AI Agents

In R/utils_getSubnetworkFromIndra.R around lines 289 to 294, the code assumes columns "HgncName" and "Site" exist when subsetting input which can cause errors; before subsetting, check if those columns are present and if not add them to input (e.g. create input$HgncName <- NA and input$Site <- NA), then subset using only the required columns (or use intersect to guard against missing ones), rename to id/logFC/adj.pvalue/hgncName/Site, filter nodes by edges, and keep the existing fallback that sets hgncName to id when NA.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

R/annotateProteinInfoFromIndra.R (1)

58-65: Harden GlobalProtein selection and avoid over-stripping with a safer PTM regex.

Validate existing GlobalProtein (NA/empty) and fall back to Protein when unusable. This echoes the prior review—adding the guard prevents silent propagation of bad IDs.
The current pattern can wrongly truncate tokens like GENE_A0A024RBG1. Anchor to PTM-like tokens at the end (e.g., _S123 or _S123;T45). Also drop NAs before building protein_ids.

Apply:

-        if ("GlobalProtein" %in% colnames(df)) {
-            protein_ids = unique(df$GlobalProtein)
-        } else {
-            df$GlobalProtein = ifelse(grepl("_[A-Z][0-9]", df$Protein),
-                                 gsub("_[A-Z][0-9].*", "", df$Protein, perl = TRUE),
-                                 df$Protein)
-            protein_ids = unique(df$GlobalProtein)
-        }
+        if ("GlobalProtein" %in% colnames(df)) {
+            bad <- is.na(df$GlobalProtein) | df$GlobalProtein == ""
+            if (all(bad)) {
+                warning("GlobalProtein column is empty; deriving from Protein.")
+                df$GlobalProtein <- df$Protein
+            }
+        } else {
+            # Strip trailing PTM site tokens (e.g., _S123 or _S123;T45), keep mnemonic suffixes like _HUMAN
+            df$GlobalProtein <- gsub("_(?:[A-Z]{1,3})[0-9]+(?:;[A-Z]{1,3}[0-9]+)*$", "", df$Protein, perl = TRUE)
+        }
+        protein_ids <- unique(na.omit(df$GlobalProtein))

🧹 Nitpick comments (2)

R/annotateProteinInfoFromIndra.R (2)

72-73: Skip NA/empty mnemonic IDs before calling the API.

Avoids unnecessary calls and edge-case warnings.

-                mnemonicProteins <- protein_ids
+                mnemonicProteins <- protein_ids[!is.na(protein_ids) & nzchar(protein_ids)]

75-79: Vectorize the mnemonic→UniProt assignment.

Removes the R-level loop and simplifies logic.

-                        for (mnemonicId in names(uniprotMapping)) {
-                                if (!is.null(uniprotMapping[[mnemonicId]])) {
-                                        df$UniprotId[df$GlobalProtein == mnemonicId] <- uniprotMapping[[mnemonicId]]
-                                }
-                        }
+                        map <- unlist(uniprotMapping, use.names = TRUE)
+                        hit <- match(df$GlobalProtein, names(map))
+                        df$UniprotId[!is.na(hit)] <- map[hit[!is.na(hit)]]

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c206517 and f790748.

📒 Files selected for processing (1)

R/annotateProteinInfoFromIndra.R (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: pkgdown
GitHub Check: build
GitHub Check: test-coverage

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

DESCRIPTION (2)
39-40: Prefer HTTPS for the project homepage URL.

If supported, switch to HTTPS.
-URL: http://msstats.org, https://vitek-lab.github.io/MSstatsBioNet/
+URL: https://msstats.org, https://vitek-lab.github.io/MSstatsBioNet/
21-26: Consider minimum-version pins for key Imports.

If you rely on recent APIs (e.g., tidyr 1.3+ helpers, RCy3 newer functions), add minimal versions in Imports to avoid CI/user mismatches.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f790748 and c8f53d1.

📒 Files selected for processing (4)

DESCRIPTION (1 hunks)
NAMESPACE (0 hunks)
R/getPathwaysFromIndra.R (0 hunks)
man/getPathwaysFromIndra.Rd (0 hunks)

💤 Files with no reviewable changes (3)

man/getPathwaysFromIndra.Rd
NAMESPACE
R/getPathwaysFromIndra.R

🔇 Additional comments (2)

DESCRIPTION (2)

22-26: Good cleanup removing MASS from Imports.

Reduces dependency footprint; tidyr trailing-comma fix looks correct.

22-26: Verified — no remaining MASS usage.

rg scan of R/ and vignettes found no fitdistr/MASS::/library(MASS)/require(MASS); NAMESPACE contains no MASS; DESCRIPTION lists Imports/Suggests but not MASS.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

vignettes/PTM-Analysis.Rmd (3)

37-41: Fail fast if the example data file is missing.

Use system.file(..., mustWork = TRUE) and avoid nested calls for clarity.

-input = data.table::fread(system.file(
-    "extdata/garrido-2024.csv",
-    package = "MSstatsBioNet"
-))
+fpath <- system.file("extdata", "garrido-2024.csv",
+                     package = "MSstatsBioNet", mustWork = TRUE)
+input <- data.table::fread(fpath)
 head(input)

50-52: Copyedit: standardize entity names (UniProt, HGNC).

Small wording/casing fix.

-In the below example, we convert uniprot IDs to their corresponding Hgnc IDs. We
-can also extract other information, such as hgnc gene name and protein function.
+In the example below, we convert UniProt accessions to their corresponding HGNC IDs. We
+can also extract other information, such as HGNC gene symbol and protein function.

61-64: Copyedit: tighten phrasing.

-subnetwork of proteins from the INDRA database based on differential abundance
-analysis results.  This function may help finding off target subnetworks.  
+subnetwork of proteins from the INDRA database based on differential abundance
+analysis results. This function may help find off-target subnetworks.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8f53d1 and 60196fd.

⛔ Files ignored due to path filters (1)

inst/extdata/garrido-2024.csv is excluded by !**/*.csv

📒 Files selected for processing (1)

vignettes/PTM-Analysis.Rmd (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: test-coverage

🔇 Additional comments (1)

vignettes/PTM-Analysis.Rmd (1)

53-57: Use parameter proteinIdType and the accepted values "Uniprot" | "Uniprot_Mnemonic"

annotateProteinInfoFromIndra expects the second arg named proteinIdType (see R/annotateProteinInfoFromIndra.R) and the implementation checks for "Uniprot" or "Uniprot_Mnemonic" — update the vignette to:
annotated_df <- annotateProteinInfoFromIndra(input, proteinIdType = "Uniprot")
Also harmonize doc/comments that use "UniProt" vs "Uniprot".

Likely an incorrect or invalid review comment.

github-actions Bot added the Review effort 3/5 label Sep 3, 2025

coderabbitai Bot reviewed Sep 3, 2025

View reviewed changes

tonywu1999 added 12 commits September 11, 2025 13:34

preliminary setup for ptm analysis

0bec2fa

shiny needs fixing to refer to global protein

Add preliminary PTM data and vignette to PTM feature branch

3e616c8

Code is working with PTM analysis, but analysis is protein specific

64b27ed

take the max logFC for ranking

25507fa

Add ability to filter by phosphorylation interaction and include site…

3c5f4b7

… info in subnetwork function

Update vignette to include statement types

207fd72

Switch if-else statement for modifications vs other stmt types

35605f1

add site information to the edges table for getSubnetworkFromIndra

6ed5888

add site information in the nodes table

ba08baf

add hover tooltip first attempt

d60e59c

hover tool tip fully working now with site info

d911ceb

finalize ptm overlap hover functionality

c206517

tonywu1999 force-pushed the feature-ptm-analysis branch from 93d7401 to c206517 Compare September 11, 2025 17:40

coderabbitai Bot reviewed Sep 11, 2025

View reviewed changes

consider scenario where global protein column does not exist

f790748

Vitek-Lab deleted a comment from coderabbitai Bot Sep 11, 2025

remove getpathwaysfromindra

c8f53d1

Vitek-Lab deleted a comment from coderabbitai Bot Sep 11, 2025

coderabbitai Bot reviewed Sep 11, 2025

View reviewed changes

modify PTM vignette

60196fd

coderabbitai Bot reviewed Sep 11, 2025

View reviewed changes

Vitek-Lab deleted a comment from coderabbitai Bot Sep 11, 2025

tonywu1999 added 2 commits September 11, 2025 14:27

remove commented code

4c8d92e

tidy up comments

0d45665

add visualization in vignette

371972d

coderabbitai Bot reviewed Sep 11, 2025

View reviewed changes

tonywu1999 added 2 commits September 11, 2025 14:48

ensure protein column is a character and not a factor

b2299e6

reorder when protein is turned into a character

bb60657

Vitek-Lab deleted a comment from coderabbitai Bot Sep 11, 2025

tonywu1999 added 3 commits September 11, 2025 14:57

remove changed comments

8887c55

add multiple PTM sites to site column

b737617

make multiple PTMs compatible in hover tooltip

5c5d1db

tonywu1999 merged commit 287bd3d into devel Sep 11, 2025
3 of 4 checks passed

tonywu1999 deleted the feature-ptm-analysis branch September 11, 2025 19:56

coderabbitai Bot mentioned this pull request Jan 15, 2026

feat(id-conversion): Support HGNC names as user input #67

Merged

This was referenced Feb 24, 2026

Filter by PTM site match #68

Merged

Refactor PTM visualization #69

Merged

Refactor viz #72

Merged


		# Create edge data with styling information and PTM overlap tooltip

-        tooltip_text <- gsub("'", "\\\\'", row$ptm_overlap)
+        # Escape special characters for JavaScript safety
+        tooltip_text <- gsub("(['\"\\\\])", "\\\\\\1", row$ptm_overlap)

Conversation

tonywu1999 commented Sep 3, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Motivation and Context

Checklist Before Requesting a Review

PR Type

Description

Diagram Walkthrough

File Walkthrough

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

github-actions Bot commented Sep 3, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions Bot commented Sep 3, 2025

PR Code Suggestions ✨

Uh oh!

codecov-commenter commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tonywu1999 commented Sep 3, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Sep 3, 2025 •

edited

Loading

codecov-commenter commented Sep 3, 2025 •

edited

Loading