Skip to content

Feature ptm analysis#56

Merged
tonywu1999 merged 23 commits intodevelfrom
feature-ptm-analysis
Sep 11, 2025
Merged

Feature ptm analysis#56
tonywu1999 merged 23 commits intodevelfrom
feature-ptm-analysis

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented Sep 3, 2025

User description

Motivation and Context

We want to add PTM analysis to MSstatsBioNet. This is a particular feature that makes this package very MSstats ecosystem of packages specific and not just for any differential analysis tool that outputs p-values.

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • Ran styler::style_pkg(transformers = styler::tidyverse_style(indent_by = 4))
  • Ran devtools::document()

PR Type

Enhancement, Documentation


Description

  • Add PTM-aware ID handling and site parsing

  • Integrate PTM site info into edges/nodes

  • Show PTM overlap tooltips in HTML viz

  • Add PTM analysis vignette


Diagram Walkthrough

flowchart LR
  Input["PTM groupComparison data"]
  Annotate["annotateProteinInfoFromIndra: PTM-aware ID mapping"]
  Subnet["getSubnetworkFromIndra: PTM site parsing"]
  Edges["Edges include stmt_type + site"]
  Nodes["Nodes include Site column"]
  Viz["HTML visualization with PTM overlap tooltips"]
  Vignette["PTM Analysis vignette"]

  Input -- "GlobalProtein/Protein" --> Annotate
  Annotate -- "HgncId mapping" --> Subnet
  Subnet -- "edgeToMetadata (site)" --> Edges
  Subnet -- "nodes with Site" --> Nodes
  Edges -- "consolidate + overlap calc" --> Viz
  Nodes -- "label/color + Site" --> Viz
  Vignette -- "usage and examples" --> Viz
Loading

File Walkthrough

Relevant files
Enhancement
annotateProteinInfoFromIndra.R
PTM-aware Uniprot ID population                                                   

R/annotateProteinInfoFromIndra.R

  • Use GlobalProtein when present for IDs
  • Map Uniprot from protein_ids not Protein
  • Support mnemonic mapping with PTM data
+8/-2     
getPathwaysFromIndra.R
Robust pathway extraction and ranking tweaks                         

R/getPathwaysFromIndra.R

  • Safeguard HGNC source selection
  • Adjust stmt target parsing by type
  • Rank by max absolute log2FC across matches
+6/-6     
utils_getSubnetworkFromIndra.R
PTM site parsing and stable node/edge metadata                     

R/utils_getSubnetworkFromIndra.R

  • Validate unique HGNC count for API limit
  • De-duplicate HGNCs in INDRA API call
  • Parse PTM Site, normalize Protein from GlobalProtein
  • Carry PTM site in edges; build nodes with Site
+53/-22 
visualizeNetworksWithHTML.R
HTML visualization with PTM overlap tooltips                         

R/visualizeNetworksWithHTML.R

  • Compute PTM site overlap per consolidated edge
  • Propagate overlap as edge tooltip in HTML
  • Consolidate edges with PTM metadata
  • Enhance HTML/JS with hover tooltips and legend note
+155/-23
Documentation
PTM-Analysis.Rmd
New PTM analysis vignette                                                               

vignettes/PTM-Analysis.Rmd

  • Add PTM workflow vignette and dataset usage
  • Show ID conversion, pathways, and subnetwork
  • Document phosphorylation filtering and visualization
+106/-0 

Summary by CodeRabbit

  • New Features

    • Web visualization now shows PTM site overlap tooltips on edges.
    • Subnetwork outputs include PTM Site details; improved protein ID handling for Uniprot mapping.
  • Refactor

    • More robust input validation/deduplication; enhanced edge/site metadata propagation.
    • Some visualization functions now accept an optional nodes parameter.
  • Breaking Changes

    • Removed the getPathwaysFromIndra function.
  • Documentation

    • Added a PTM Analysis vignette.
    • Updated main vignette to use browser-based visualization with expanded statement types.
  • Chores

    • Removed an unused dependency and related imports.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Sep 3, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Removes the exported getPathwaysFromIndra API and associated docs. Adds canonical GlobalProtein handling in annotation. Extends subnetwork utilities with PTM site parsing, node construction revisions, and input validations. Enhances HTML visualization with PTM-overlap tooltips and updated function signatures. Adds a new PTM vignette, updates the main vignette, and adjusts DESCRIPTION/NAMESPACE.

Changes

Cohort / File(s) Summary
Protein ID canonicalization
R/annotateProteinInfoFromIndra.R
Selects/creates GlobalProtein by stripping suffixes from Protein; uses GlobalProtein for Uniprot ID mapping and mnemonic handling; updates assignment logic accordingly.
Remove INDRA pathways API
R/getPathwaysFromIndra.R, man/getPathwaysFromIndra.Rd, NAMESPACE
Deletes getPathwaysFromIndra function and Rd; removes export and related imports (MASS::fitdistr, httr::GET).
INDRA subnetwork utils and PTM sites
R/utils_getSubnetworkFromIndra.R
Counts proteins via unique HGNC; deduplicates HGNC in API groundings; validates force_include_other as namespace:id; derives Site from Protein suffixes; prefers GlobalProtein; parses stmt_json for residue/position into site; adds site column to edges; rebuilds nodes from input and restricts to in-edges.
HTML visualization with PTM overlap
R/visualizeNetworksWithHTML.R
Adds PTM overlap aggregation; consolidateEdges(edges, nodes = NULL) and createEdgeElements(edges, nodes = NULL) accept nodes and attach ptm_overlap; generateCytoscapeConfig passes nodes; generateJavaScriptCode adds edge hover tooltips; HTML export text mentions PTM info.
Vignettes
vignettes/PTM-Analysis.Rmd, vignettes/MSstatsBioNet.Rmd
New PTM analysis vignette using annotateProteinInfoFromIndra, getSubnetworkFromIndra, previewNetworkInBrowser. Updates main vignette to include statement_types and browser-based visualization with displayLabelType.
Package metadata
DESCRIPTION, NAMESPACE
Removes MASS from Imports; tidyr becomes last entry. Drops importFrom(MASS, fitdistr) and importFrom(httr, GET).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant R_Funcs as R Visualization
  participant Consolidator as consolidateEdges
  participant Creator as createEdgeElements
  participant Config as generateCytoscapeConfig
  participant JS as generateJavaScriptCode
  participant Browser as CytoscapeJS

  User->>R_Funcs: previewNetworkInBrowser(nodes, edges)
  R_Funcs->>Creator: createEdgeElements(edges, nodes)
  Creator->>Consolidator: consolidateEdges(edges, nodes)
  Consolidator->>Consolidator: Group by source-target-interaction<br/>Compute PTM overlap text
  Consolidator-->>Creator: edges + ptm_overlap
  Creator-->>Config: edge elements with tooltip field
  Config-->>JS: Cytoscape config (nodes, edges, tooltips)
  JS->>Browser: Initialize graph and tooltip handlers
  User->>Browser: Hover edge
  Browser-->>User: Show PTM overlap tooltip (if non-empty)
Loading
sequenceDiagram
  autonumber
  participant Input as Annotated Input
  participant Utils as getSubnetworkFromIndra (utils)
  participant API as INDRA API
  participant Parser as JSON Parser
  participant Builder as Node/Edge Builder

  Input->>Utils: annotated_df (Protein/GlobalProtein, HgncId, log2FC, adj.pvalue)
  Utils->>Utils: Validate protein count (unique HGNC)
  Utils->>API: Query with unique HGNC + force_include_other (ns:id)
  API-->>Utils: Statements (stmt_json, refs)
  Utils->>Parser: Extract residue/position from stmt_json
  Parser-->>Utils: site (e.g., Y1110)
  Utils->>Builder: Map edges (add site), map nodes from input, restrict to in-edges
  Builder-->>Input: nodes, edges (with site)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

I nibble code like clover tips,
Mapping sites on protein scripts—
Hover, see the PTMs glow,
Edges whisper what they know.
Old paths pruned, new trails spun,
Networks bloom—hop, hop—well done!
(_/)* 🥕


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60196fd and 5c5d1db.

📒 Files selected for processing (5)
  • R/annotateProteinInfoFromIndra.R (1 hunks)
  • R/utils_getSubnetworkFromIndra.R (10 hunks)
  • R/visualizeNetworksWithHTML.R (17 hunks)
  • vignettes/MSstatsBioNet.Rmd (2 hunks)
  • vignettes/PTM-Analysis.Rmd (1 hunks)
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature-ptm-analysis

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Sep 3, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

Node construction now subsets input directly and then filters by edge endpoints; if multiple rows share the same Protein (e.g., multiple PTM Sites), this may drop needed nodes or produce duplicates depending on upstream filtering. Confirm this aligns with expected node cardinality and that attributes (logFC, adj.pvalue, Site) map correctly after PTM handling and GlobalProtein substitution.

nodes = input[, c("Protein", "log2FC", "adj.pvalue", "HgncName", "Site")]
colnames(nodes) = c("id", "logFC", "adj.pvalue", "hgncName", "Site")

nodes = nodes[nodes$id %in% c(edges$source, edges$target), ]
nodes$hgncName = ifelse(is.na(nodes$hgncName), nodes$id, nodes$hgncName)
# 
# # Get unique nodes from edges
# node_ids <- unique(c(edges$source, edges$target))
# 
# # Create base nodes dataframe
# nodes <- data.frame(
#     id = node_ids,
#     stringsAsFactors = FALSE
# )
# 
# # Add attributes from input where available
# nodes$logFC <- input$log2FC[match(nodes$id, input$Protein)]
# nodes$adj.pvalue <- input$adj.pvalue[match(nodes$id, input$Protein)]
# nodes$hgncName <- if ("HgncName" %in% colnames(input) && is.character(input$HgncName)) {
#     hgnc_value <- input$HgncName[match(nodes$id, input$Protein)]
#     ifelse(is.na(hgnc_value), nodes$id, hgnc_value)
# } else {
#     nodes$id
# }
# nodes$site <- input$Site[match(nodes$id, input$Protein)]

return(nodes)
Edge Target Selection

For non-complex statement types, branch logic was broadened; ensure that for types outside Activation/Inhibition/IncreaseAmount/DecreaseAmount, accessing edge$sub is always valid. Otherwise, missing sub may cause errors or drop edges. Validate against INDRA payload variations.

} else if (edge$type %in% c("Activation", "Inhibition", "IncreaseAmount", "DecreaseAmount")) {
    obj = edge$obj$db_refs$HGNC
    namespaces = names(edge$obj$db_refs)
} else {
    obj = edge$sub$db_refs$HGNC
    namespaces = names(edge$sub$db_refs)
}

# Filter out edges with no HGNC ID or not in the dataset
PTM Site Parsing

Site extraction uses a simple pattern "_[A-Z][0-9]" and gsub with lookahead; multi-site formats or different delimiters may not parse as intended. Verify it handles cases like multiple sites, lowercase residues, and proteins containing underscores unrelated to sites.

input$Site = ifelse(grepl("_[A-Z][0-9]", input$Protein),
                    gsub(".*?_(?=[A-Z][0-9])", "", input$Protein, perl = TRUE),
                    NA_character_)
if ("GlobalProtein" %in% colnames(input)) {
    input$Protein = input$GlobalProtein
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Sep 3, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Safely select target logFC

Guard against empty matches and all-NA log2FC to avoid indexing errors. Provide a
safe default (e.g., 0) or skip probability calculation when no valid log2FC exists.

R/getPathwaysFromIndra.R [122-124]

-logFC = annotated_df[which(annotated_df$HgncId == edgeToMetadataMapping[[key]]$target_id),]
-logFC = logFC$log2FC[which.max(abs(logFC$log2FC))]
+logFC_row <- annotated_df[annotated_df$HgncId == edgeToMetadataMapping[[key]]$target_id, ]
+if (nrow(logFC_row) == 0 || all(is.na(logFC_row$log2FC))) {
+    logFC <- 0
+} else {
+    idx <- which.max(abs(logFC_row$log2FC))
+    if (length(idx) == 0 || is.na(idx)) {
+        logFC <- 0
+    } else {
+        logFC <- logFC_row$log2FC[idx]
+    }
+}
Suggestion importance[1-10]: 7

__

Why: Adds defensive checks for empty matches and all-NA log2FC, preventing indexing errors and making probability calc robust; aligns with the new selection line at 123.

Medium
Escape tooltip content safely

Escape backslashes and newlines in ptm_overlap to avoid invalid JS strings and
broken HTML. Apply a more comprehensive sanitization before embedding in the
JSON-like string.

R/visualizeNetworksWithHTML.R [342-355]

-tooltip_text <- gsub("'", "\\\\'", row$ptm_overlap)
-# Create edge data with styling information and PTM overlap tooltip
+tooltip_text <- row$ptm_overlap
+tooltip_text <- gsub("\\\\", "\\\\\\\\", tooltip_text)   # escape backslashes
+tooltip_text <- gsub("'", "\\\\'", tooltip_text)         # escape single quotes
+tooltip_text <- gsub("\n", "\\\\n", tooltip_text)        # escape newlines
+tooltip_text <- gsub("\r", "", tooltip_text)             # remove CR
 edge_data <- paste0("{ data: { source: '", row$source, 
                     "', target: '", row$target, 
                     "', id: '", edge_key,
                     "', interaction: '", row$interaction,
                     "', edge_type: '", row$edge_type,
                     "', category: '", row$category,
                     "', color: '", style$color,
                     "', line_style: '", style$style,
                     "', arrow_shape: '", style$arrow,
                     "', width: ", style$width,
                     ", tooltip: '", tooltip_text, "' } }")
Suggestion importance[1-10]: 6

__

Why: Further sanitizes tooltip strings (backslashes/newlines) to avoid malformed JS; complements existing single-quote escaping and reduces risk of runtime issues.

Low
General
Robust PTM site parsing

Ensure Site extraction only runs when Protein is character and contains an
underscore-delimited site; otherwise grepl/gsub can error. Normalize Protein to
character before parsing and keep Site aligned if GlobalProtein overwrites Protein.

R/utils_getSubnetworkFromIndra.R [148-153]

-input$Site = ifelse(grepl("_[A-Z][0-9]", input$Protein),
-                    gsub(".*?_(?=[A-Z][0-9])", "", input$Protein, perl = TRUE),
-                    NA_character_)
+input$Protein <- as.character(input$Protein)
 if ("GlobalProtein" %in% colnames(input)) {
-    input$Protein = input$GlobalProtein
+    input$GlobalProtein <- as.character(input$GlobalProtein)
+}
+has_site <- !is.na(input$Protein) & grepl("_[A-Z][0-9]+$", input$Protein)
+input$Site <- ifelse(has_site,
+                     gsub(".*?_(?=[A-Z][0-9]+$)", "", input$Protein, perl = TRUE),
+                     NA_character_)
+if ("GlobalProtein" %in% colnames(input)) {
+    # Preserve parsed Site while replacing Protein with global protein id
+    input$Protein <- input$GlobalProtein
 }
Suggestion importance[1-10]: 6

__

Why: Normalizes types and tightens regex, and preserves Site when replacing Protein; matches the PTM handling block and improves resilience with modest impact.

Low

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 3, 2025

Codecov Report

❌ Patch coverage is 81.48148% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.30%. Comparing base (6330e22) to head (5c5d1db).

Files with missing lines Patch % Lines
R/visualizeNetworksWithHTML.R 78.94% 20 Missing ⚠️
R/utils_getSubnetworkFromIndra.R 86.66% 4 Missing ⚠️
R/annotateProteinInfoFromIndra.R 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel      #56      +/-   ##
==========================================
+ Coverage   57.36%   64.30%   +6.94%     
==========================================
  Files           8        7       -1     
  Lines        1222     1216       -6     
==========================================
+ Hits          701      782      +81     
+ Misses        521      434      -87     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
R/annotateProteinInfoFromIndra.R (1)

70-79: Update Protein column reference to use protein_ids consistently.

In the Uniprot_Mnemonic path, the code still references df$Protein when setting UniprotId values, but should use the index based on the original protein_ids source.

Fix the inconsistent column reference:

         if (proteinIdType == "Uniprot_Mnemonic") {
                 mnemonicProteins <- protein_ids
                 if (length(mnemonicProteins) > 0) {
                         uniprotMapping <- .callGetUniprotIdsFromUniprotMnemonicIdsApi(as.list(mnemonicProteins))
                         for (mnemonicId in names(uniprotMapping)) {
                                 if (!is.null(uniprotMapping[[mnemonicId]])) {
-                                        df$UniprotId[df$Protein == mnemonicId] <- uniprotMapping[[mnemonicId]]
+                                        df$UniprotId[protein_ids == mnemonicId] <- uniprotMapping[[mnemonicId]]
                                 }
                         }
                 }
         }
R/getPathwaysFromIndra.R (7)

29-33: Harden normal fit: handle NA/Inf and small-N failures.

fitdistr will error if log2FC has NA/Inf or too few finite values. Add filtering and a safe fallback.

Apply:

-    log2fc_values <- annotated_df$log2FC
-    fit <- fitdistr(log2fc_values, "normal")
-    para <- fit$estimate
+    log2fc_values <- annotated_df$log2FC
+    log2fc_values <- log2fc_values[is.finite(log2fc_values)]
+    if (length(log2fc_values) >= 2) {
+        fit <- fitdistr(log2fc_values, "normal")
+        para <- fit$estimate
+    } else {
+        # Fallback to standard normal when insufficient data
+        para <- c(mean = 0, sd = 1)
+    }

28-28: Ensure consistent types for HGNC IDs.

Later comparisons rely on character equality. Coerce once to avoid subtle mismatches.

Apply:

     annotated_df$Protein <- as.character(annotated_df$Protein)
+    annotated_df$HgncId  <- as.character(annotated_df$HgncId)

53-57: Build URL via query params and handle HTTP errors.

Avoid manual string paste and add basic HTTP error handling.

Apply:

-    url = paste('https://db.indra.bio/statements/from_agents?subject=',
-                source_id, namespace, sep = "")
-    response <- GET(url)
-    z = content(response)
+    response <- GET(
+        "https://db.indra.bio/statements/from_agents",
+        query = list(subject = paste0(source_id, namespace))
+    )
+    if (response$status_code >= 300) {
+        stop(sprintf("INDRA request failed (HTTP %s)", response$status_code))
+    }
+    z <- content(response)

66-77: Subject matching in Complex edges: avoid identical() type pitfalls.

Coerce both sides to character; identical() will fail on numeric vs character.

Apply:

-                if (identical(edge$members[[1]]$db_refs[[id_field]], source_id)) {
+                if (as.character(edge$members[[1]]$db_refs[[id_field]]) == source_id) {
                     obj = edge$members[[2]]$db_refs$HGNC
                     namespaces = names(edge$members[[2]]$db_refs)
                 } else {
                     obj = edge$members[[1]]$db_refs$HGNC
                     namespaces = names(edge$members[[1]]$db_refs)
                 }

86-91: HGNC filter: be explicit about types and missing values.

Make membership checks robust to NA/NULL and type coercion.

Apply:

-        if (!("HGNC" %in% namespaces)) {
+        if (is.null(namespaces) || !("HGNC" %in% namespaces)) {
             next
-        } else if (!(obj %in% annotated_df$HgncId)) {
+        } else if (is.null(obj) || is.na(obj) || !(as.character(obj) %in% annotated_df$HgncId)) {
             next
         }

121-132: Probability calc: NA-safe logFC, avoid log10(0), clamp to [0,1].

Prevents runtime errors and keeps probabilities in range.

Apply:

-        prob_logFC = 0
-        logFC = annotated_df[which(annotated_df$HgncId == edgeToMetadataMapping[[key]]$target_id),]
-        logFC = logFC$log2FC[which.max(abs(logFC$log2FC))]
+        prob_logFC <- 0
+        logFC_rows <- annotated_df[annotated_df$HgncId == edgeToMetadataMapping[[key]]$target_id, , drop = FALSE]
+        vals <- logFC_rows$log2FC
+        vals <- vals[is.finite(vals)]
+        logFC <- if (length(vals)) vals[which.max(abs(vals))] else 0
         if (logFC > para[1]) {
             prob_logFC = 1 - pnorm(logFC, mean = para[1], sd = para[2])
         } else {
             prob_logFC = pnorm(logFC, mean = para[1], sd = para[2])
         }
-        evidence_prob = 10^(m*log10(edgeToMetadataMapping[[key]]$data$evidence_count)+b)
-        edgeToMetadataMapping[[key]]$data$total_prob = 1 - ((1 - prob_logFC) * (1 - evidence_prob))
+        cnt <- max(1, as.numeric(edgeToMetadataMapping[[key]]$data$evidence_count))
+        evidence_prob <- 10^(m * log10(cnt) + b)
+        evidence_prob <- max(0, min(1, evidence_prob))
+        total_prob <- 1 - ((1 - prob_logFC) * (1 - evidence_prob))
+        edgeToMetadataMapping[[key]]$data$total_prob <- max(0, min(1, total_prob))
-        edgeToMetadataMapping[[key]]$data$logFC = logFC
+        edgeToMetadataMapping[[key]]$data$logFC <- logFC

160-175: Align added node schema with .constructNodesDataFrame

  • Replace pvalue = 0 with adj.pvalue = 0 when appending the main target node:
-                logFC = 0,
-                pvalue = 0,
+                logFC = 0,
+                adj.pvalue = 0,
  • Optionally, verify that main_target uses the same identifier namespace (UniProt) as nodes$id, or map it before binding to avoid orphaned nodes.
🧹 Nitpick comments (7)
R/utils_getSubnetworkFromIndra.R (1)

289-314: Clean up commented-out code.

There's a large block of commented-out code that should be removed if it's no longer needed. This impacts code readability and maintenance.

Remove the commented-out code block from lines 294-313 since the new implementation is working correctly.

R/visualizeNetworksWithHTML.R (1)

566-567: Simplify the tooltip display condition.

The condition checking for "No overlapping PTM sites found" is unnecessary since the new implementation returns an empty string for no overlaps.

Simplify the condition:

-        if (tooltipText && tooltipText.trim() !== '' && tooltipText.trim() !== 'No overlapping PTM sites found') {
+        if (tooltipText && tooltipText.trim() !== '') {
R/annotateProteinInfoFromIndra.R (1)

58-58: Address the TODO comment about PTM dataset handling.

The TODO comment indicates that PTM dataset handling needs improvement. This should be addressed or tracked in an issue.

Would you like me to help create a more robust PTM dataset handling implementation or open an issue to track this technical debt?

R/getPathwaysFromIndra.R (1)

135-158: Nit: use explicit FUN.VALUE types in vapply.

Improve type safety by using character(1)/numeric(1).

Apply:

-        }, ""),
+        }, character(1)),
...
-        }, ""),
+        }, character(1)),
...
-        }, 1),
+        }, numeric(1)),
...
-        }, 1),
+        }, numeric(1)),
...
-        }, 1),
+        }, numeric(1)),
...
-        }, ""),
+        }, character(1)),
vignettes/PTM-Analysis.Rmd (3)

31-33: Verify install branch name.

The PR branch is “feature-ptm-analysis”, but the install command uses “feature-ptm”. Update if that’s unintentional.

Apply:

-devtools::install_github("Vitek-Lab/MSstatsBioNet@feature-ptm", build_vignettes = TRUE)
+devtools::install_github("Vitek-Lab/MSstatsBioNet@feature-ptm-analysis", build_vignettes = TRUE)

72-78: Set expectations conservatively.

“GAB1 is rated as the most relevant edge” may not always hold with user data/filters. Phrase as an example or remove.

Apply:

-Here, we will get pathways from INDRA and rank them based on relevance.  We will see here that GAB1 is rated as the most relevant edge to track.
+Here, we will get pathways from INDRA and rank them based on relevance. Results will vary by dataset and filters.

24-33: Keep install chunks non-evaluating.

Good use of eval = FALSE. Consider also prefacing with a note that installs should be run in a clean R session, not during vignette build.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f41191a and 93d7401.

⛔ Files ignored due to path filters (1)
  • inst/extdata/2_egf_data.csv is excluded by !**/*.csv
📒 Files selected for processing (5)
  • R/annotateProteinInfoFromIndra.R (1 hunks)
  • R/getPathwaysFromIndra.R (3 hunks)
  • R/utils_getSubnetworkFromIndra.R (9 hunks)
  • R/visualizeNetworksWithHTML.R (19 hunks)
  • vignettes/PTM-Analysis.Rmd (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pkgdown
  • GitHub Check: Run pr agent on every pull request
  • GitHub Check: test-coverage
  • GitHub Check: build
🔇 Additional comments (2)
R/utils_getSubnetworkFromIndra.R (1)

148-153: Fix the regex pattern for PTM site extraction.

The current regex pattern ".*?_(?=[A-Z][0-9])" will fail to correctly extract PTM sites in several cases:

  1. It doesn't properly handle multi-digit positions (e.g., _S123)
  2. The replacement pattern removes everything up to the underscore, but should only remove the protein name prefix

Apply this diff to fix the PTM site extraction:

-    input$Site = ifelse(grepl("_[A-Z][0-9]", input$Protein),
-                        gsub(".*?_(?=[A-Z][0-9])", "", input$Protein, perl = TRUE),
+    input$Site = ifelse(grepl("_[A-Z][0-9]+", input$Protein),
+                        gsub("^[^_]+_", "", input$Protein),
                         NA_character_)

Likely an incorrect or invalid review comment.

R/visualizeNetworksWithHTML.R (1)

960-963: LGTM! Clear PTM visualization documentation.

The addition of PTM site overlap information in the legend and instructions is well-implemented and provides clear guidance to users about the hover functionality.

Comment thread R/utils_getSubnetworkFromIndra.R
Comment thread R/utils_getSubnetworkFromIndra.R
Comment on lines +82 to +144
calculatePTMOverlapAggregated <- function(edges, nodes) {
if (nrow(edges) == 0) return(character(0))

# Group edges by source-target-interaction to match consolidation logic
edges$edge_key <- paste(edges$source, edges$target, edges$interaction, sep = "-")
unique_edges <- unique(edges$edge_key)

overlap_info <- character(length(unique_edges))
names(overlap_info) <- unique_edges

for (edge_key in unique_edges) {
# Get all edges with this source-target-interaction combination
matching_edges <- edges[edges$edge_key == edge_key, ]
all_overlap_sites <- c()

# Process each matching edge to find PTM overlaps
for (i in 1:nrow(matching_edges)) {
edge <- matching_edges[i, ]

# Check if edge has target and site information
if (!is.na(edge$target) && "site" %in% names(edge) && !is.na(edge$site)) {
# Find matching nodes with the same target ID
target_nodes <- nodes[nodes$id == edge$target, ]

if (nrow(target_nodes) > 0 && "Site" %in% names(target_nodes)) {
edge_sites <- trimws(unlist(strsplit(as.character(edge$site), "[,;|]")))

# Check each target node row for site matches
for (j in 1:nrow(target_nodes)) {
if (!is.na(target_nodes$Site[j])) {
node_sites <- trimws(unlist(strsplit(as.character(target_nodes$Site[j]), "[,;|]")))

# Find overlapping sites for this edge-node combination
overlap_sites <- intersect(edge_sites, node_sites)
overlap_sites <- overlap_sites[overlap_sites != "" & !is.na(overlap_sites)]

# Add to the aggregate list
all_overlap_sites <- c(all_overlap_sites, overlap_sites)
}
}
}
}
}

# Remove duplicates and create tooltip text for this consolidated edge
unique_overlap_sites <- unique(all_overlap_sites)
unique_overlap_sites <- unique_overlap_sites[unique_overlap_sites != "" & !is.na(unique_overlap_sites)]

# CHANGED: Only create tooltip text if there are actual overlapping sites
if (length(unique_overlap_sites) > 0) {
if (length(unique_overlap_sites) == 1) {
overlap_info[edge_key] <- paste0("Overlapping PTM site: ", unique_overlap_sites[1])
} else {
overlap_info[edge_key] <- paste0("Overlapping PTM sites: ", paste(unique_overlap_sites, collapse = ", "))
}
} else {
# CHANGED: Return empty string instead of "No overlapping PTM sites found"
overlap_info[edge_key] <- ""
}
}

return(overlap_info)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve PTM overlap calculation performance and robustness.

The PTM overlap calculation has several issues:

  1. The triple nested loop (edges × target_nodes × sites) could be inefficient for large datasets
  2. Site splitting using multiple delimiters [,;|] should be documented
  3. No validation that the Site column exists in nodes before accessing it

Consider these improvements:

 calculatePTMOverlapAggregated <- function(edges, nodes) {
     if (nrow(edges) == 0) return(character(0))
+    
+    # Validate required columns
+    if (!"Site" %in% names(nodes)) {
+        return(character(nrow(edges)))  # Return empty overlaps if no Site info
+    }
     
     # Group edges by source-target-interaction to match consolidation logic
     edges$edge_key <- paste(edges$source, edges$target, edges$interaction, sep = "-")

Also consider pre-processing the node sites into a lookup table to improve performance.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
R/visualizeNetworksWithHTML.R around lines 82-144: the PTM overlap routine is
slow and fragile due to triple nested loops, lack of Site column validation, and
undocumented delimiter use; fix by (1) early-validate that nodes has a "Site"
column and gracefully handle missing/NA values; (2) pre-process nodes into a
lookup list mapping node id -> unique vector of trimmed sites (split once using
documented delimiter pattern "[,;|]"), removing empty/NA entries; (3) compute
overlaps per consolidated edge by splitting the edge site string once and
intersecting with the precomputed node-site vectors (use vectorized set
operations rather than looping per target node row), aggregating unique overlaps
and producing the same tooltip string logic (empty string when none); and (4)
add a short comment noting the delimiter semantics and why the lookup improves
performance.

Comment on lines +342 to +348

# Create edge data with styling information and PTM overlap tooltip
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Escape quotes more robustly for JavaScript safety.

The current quote escaping only handles single quotes but should also handle double quotes and other special characters that could break JavaScript.

Use a more robust escaping approach:

-        # Escape quotes in tooltip text for JavaScript safety
-        tooltip_text <- gsub("'", "\\\\'", row$ptm_overlap)
+        # Escape special characters for JavaScript safety
+        tooltip_text <- gsub("(['\"\\\\])", "\\\\\\1", row$ptm_overlap)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
tooltip_text <- gsub("'", "\\\\'", row$ptm_overlap)
# Escape special characters for JavaScript safety
tooltip_text <- gsub("(['\"\\\\])", "\\\\\\1", row$ptm_overlap)

Comment on lines +542 to +577
tooltip.style.cssText = `
position: absolute;
background-color: rgba(0, 0, 0, 0.9);
color: white;
padding: 8px 12px;
border-radius: 4px;
font-size: 12px;
font-family: Arial, sans-serif;
white-space: nowrap;
pointer-events: none;
z-index: 9999;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.3);
display: none;
max-width: 300px;
word-wrap: break-word;
white-space: pre-wrap;
`;
document.body.appendChild(tooltip);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add null checks for tooltip element creation.

The tooltip creation code assumes document.body exists when the script runs, which might not always be the case if the script executes before DOM is ready.

Wrap tooltip creation in a DOM ready check:

+    // Ensure DOM is ready before creating tooltip
+    if (document.body) {
     var tooltip = document.createElement('div');
     tooltip.style.cssText = \`...\`;
     document.body.appendChild(tooltip);
+    } else {
+        console.error('Document body not ready for tooltip creation');
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var tooltip = document.createElement('div');
tooltip.style.cssText = `
position: absolute;
background-color: rgba(0, 0, 0, 0.9);
color: white;
padding: 8px 12px;
border-radius: 4px;
font-size: 12px;
font-family: Arial, sans-serif;
white-space: nowrap;
pointer-events: none;
z-index: 9999;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.3);
display: none;
max-width: 300px;
word-wrap: break-word;
white-space: pre-wrap;
`;
document.body.appendChild(tooltip);
// Ensure DOM is ready before creating tooltip
if (document.body) {
var tooltip = document.createElement('div');
tooltip.style.cssText = `
position: absolute;
background-color: rgba(0, 0, 0, 0.9);
color: white;
padding: 8px 12px;
border-radius: 4px;
font-size: 12px;
font-family: Arial, sans-serif;
white-space: nowrap;
pointer-events: none;
z-index: 9999;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.3);
display: none;
max-width: 300px;
word-wrap: break-word;
white-space: pre-wrap;
`;
document.body.appendChild(tooltip);
} else {
console.error('Document body not ready for tooltip creation');
}
🤖 Prompt for AI Agents
In R/visualizeNetworksWithHTML.R around lines 542 to 560, the code creates and
appends a tooltip element assuming document.body exists; change this to wait for
DOM readiness by checking document.readyState and if not 'loading' create and
append the tooltip immediately, otherwise attach a DOMContentLoaded listener to
create/append the tooltip when the DOM is ready; ensure the tooltip variable is
created in the same scope so event handlers can access it and guard against
duplicate creation if the listener is invoked multiple times.

Comment thread vignettes/PTM-Analysis.Rmd Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
R/utils_getSubnetworkFromIndra.R (2)

59-66: Add HTTP timeouts and error handling for INDRA API calls.

Network calls lack timeouts and status checks; failures can hang or silently proceed with bad responses.

-    res <- POST(
+    res <- POST(
         indraCogexUrl,
         body = groundings,
         add_headers("Content-Type" = "application/json"),
-        encode = "raw"
+        encode = "raw",
+        httr::timeout(60)
     )
-    res <- content(res)
+    if (httr::http_error(res)) {
+        stop(sprintf("INDRA API request failed (HTTP %s): %s",
+                     httr::status_code(res), httr::content(res, as = "text")))
+    }
+    res <- content(res)

110-123: Normalize Protein and derive Site before exemptions/filters; avoid factor coercion.

As written, exempt_proteins are captured before Site/GlobalProtein normalization, so forced proteins may lose Site and dedup may keep stale rows. Also, assigning GlobalProtein can coerce Protein back to factor.

 .filterGetSubnetworkFromIndraInput <- function(input, pvalueCutoff, logfc_cutoff, force_include_proteins) {
-    # Extract exempt proteins before any filtering
+    # Normalize Protein and derive Site before exemptions/filters
+    prot_raw = as.character(input$Protein)
+    input$Site = ifelse(grepl("_[A-Za-z][0-9]", prot_raw),
+                        sub(".*?_(?=[A-Za-z][0-9])", "", prot_raw, perl = TRUE),
+                        NA_character_)
+    if ("GlobalProtein" %in% colnames(input)) {
+        input$Protein = as.character(input$GlobalProtein)
+    } else {
+        input$Protein = prot_raw
+    }
+
+    # Extract exempt proteins after normalization
     exempt_proteins <- NULL
     if (!is.null(force_include_proteins)) {
         if (!is.character(force_include_proteins)) {
             stop("force_include_proteins must be a character vector")
         }
         missing_prots <- setdiff(force_include_proteins, input$Protein)
         if (length(missing_prots) > 0) {
             warning("force_include_proteins not found: ", paste(missing_prots, collapse = ", "))
         }
         exempt_proteins <- input[input$Protein %in% force_include_proteins,]
     }
-    
-    # Apply standard filtering
+    # Apply standard filtering
     input <- input[!is.na(input$adj.pvalue),]
@@
-    input$Protein <- as.character(input$Protein)
-    
-    # Handle PTMs in Protein column
-    input$Site = ifelse(grepl("_[A-Z][0-9]", input$Protein),
-                        gsub(".*?_(?=[A-Z][0-9])", "", input$Protein, perl = TRUE),
-                        NA_character_)
-    if ("GlobalProtein" %in% colnames(input)) {
-        input$Protein = input$GlobalProtein
-    }

Also applies to: 145-153

♻️ Duplicate comments (3)
R/utils_getSubnetworkFromIndra.R (3)

45-46: Good: deduplicate HGNC IDs before building groundings.

This addresses earlier feedback about deduplication ordering.


172-177: Guard against missing Protein column in metadata mapping.

Accessing input$Protein without validation can error; add a check.

 .addAdditionalMetadataToIndraEdge <- function(edge, input) {
+    if (!"Protein" %in% colnames(input)) {
+        stop("Input must contain 'Protein' column for edge metadata")
+    }
@@
-    uniprot_ids_source <- unique(matched_rows_source$Protein)
+    uniprot_ids_source <- unique(matched_rows_source$Protein)
@@
-    uniprot_ids_target = unique(matched_rows_target$Protein)
+    uniprot_ids_target = unique(matched_rows_target$Protein)

Also applies to: 180-185


13-14: Fix 400-protein check to only count unique HGNC IDs (including HGNC in force_include_other).

Currently all force_include_other entries are counted, even non-HGNC namespaces and duplicates, which can wrongly reject valid queries.

-    num_proteins = length(unique(input$HgncId)) + 
-        ifelse(!is.null(force_include_other), length(force_include_other), 0)
+    unique_hgnc_ids = unique(input$HgncId)
+    if (!is.null(force_include_other)) {
+        hgnc_extra = vapply(strsplit(force_include_other, ":"), function(parts) {
+            if (length(parts) == 2 && toupper(parts[1]) == "HGNC") parts[2] else NA_character_
+        }, NA_character_)
+        unique_hgnc_ids = union(unique_hgnc_ids, unique(stats::na.omit(hgnc_extra)))
+    }
+    num_proteins = length(unique_hgnc_ids)
🧹 Nitpick comments (1)
R/utils_getSubnetworkFromIndra.R (1)

260-262: Preserve NA for site column.

Using "" as FUN.VALUE can coerce NA to empty string in some flows; prefer NA_character_.

-        site = vapply(keys(res), function(x) {
-            query(res, x)$site
-        }, ""),
+        site = vapply(keys(res), function(x) {
+            query(res, x)$site
+        }, NA_character_),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 93d7401 and c206517.

⛔ Files ignored due to path filters (1)
  • inst/extdata/2_egf_data.csv is excluded by !**/*.csv
📒 Files selected for processing (5)
  • R/annotateProteinInfoFromIndra.R (1 hunks)
  • R/getPathwaysFromIndra.R (3 hunks)
  • R/utils_getSubnetworkFromIndra.R (9 hunks)
  • R/visualizeNetworksWithHTML.R (19 hunks)
  • vignettes/PTM-Analysis.Rmd (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • R/visualizeNetworksWithHTML.R
  • vignettes/PTM-Analysis.Rmd
  • R/annotateProteinInfoFromIndra.R
  • R/getPathwaysFromIndra.R
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build
  • GitHub Check: test-coverage
  • GitHub Check: pkgdown
🔇 Additional comments (2)
R/utils_getSubnetworkFromIndra.R (2)

194-194: Import alignment looks good.

fromJSON is used below; roxygen import is appropriate.


232-233: Import alignment for fromJSON is consistent.

Matches usage below in edges construction.

Comment on lines +204 to +210
json_object <- fromJSON(edge$data$stmt_json)
if (!is.null(json_object$residue) && !is.null(json_object$position)) {
edge$site = paste0(json_object$residue, json_object$position)
key <- paste(key, edge$site, sep = "_")
} else {
edge$site = NA_character_
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden stmt_json parsing.

fromJSON can throw on malformed JSON; use tryCatch to avoid dropping the whole run.

-        json_object <- fromJSON(edge$data$stmt_json)
-        if (!is.null(json_object$residue) && !is.null(json_object$position)) {
+        json_object <- tryCatch(fromJSON(edge$data$stmt_json), error = function(e) NULL)
+        if (!is.null(json_object) && !is.null(json_object$residue) && !is.null(json_object$position)) {
             edge$site = paste0(json_object$residue, json_object$position)
             key <- paste(key, edge$site, sep = "_")
         } else {
             edge$site = NA_character_
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
json_object <- fromJSON(edge$data$stmt_json)
if (!is.null(json_object$residue) && !is.null(json_object$position)) {
edge$site = paste0(json_object$residue, json_object$position)
key <- paste(key, edge$site, sep = "_")
} else {
edge$site = NA_character_
}
json_object <- tryCatch(fromJSON(edge$data$stmt_json), error = function(e) NULL)
if (!is.null(json_object) && !is.null(json_object$residue) && !is.null(json_object$position)) {
edge$site = paste0(json_object$residue, json_object$position)
key <- paste(key, edge$site, sep = "_")
} else {
edge$site = NA_character_
}
🤖 Prompt for AI Agents
In R/utils_getSubnetworkFromIndra.R around lines 204-210,
fromJSON(edge$data$stmt_json) can throw on malformed JSON and crash the run;
wrap the fromJSON call in tryCatch to safely handle parsing errors (returning
NULL or an empty list on error), and when an error occurs set edge$site <-
NA_character_ (and optionally log a warning including the edge id or key).
Ensure the subsequent logic that sets key and pastes site only runs when
json_object was successfully parsed and residue/position are present.

Comment thread R/utils_getSubnetworkFromIndra.R Outdated
Comment on lines +289 to +294
nodes = input[, c("Protein", "log2FC", "adj.pvalue", "HgncName", "Site")]
colnames(nodes) = c("id", "logFC", "adj.pvalue", "hgncName", "Site")

# Add attributes from input where available
nodes$logFC <- input$log2FC[match(nodes$id, input$Protein)]
nodes$adj.pvalue <- input$adj.pvalue[match(nodes$id, input$Protein)]
nodes$hgncName <- if ("HgncName" %in% colnames(input) && is.character(input$HgncName)) {
hgnc_value <- input$HgncName[match(nodes$id, input$Protein)]
ifelse(is.na(hgnc_value), nodes$id, hgnc_value)
} else {
nodes$id
}
nodes = nodes[nodes$id %in% c(edges$source, edges$target), ]
nodes$hgncName = ifelse(is.na(nodes$hgncName), nodes$id, nodes$hgncName)
#
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make nodes construction resilient to missing columns.

Selecting HgncName/Site without existence checks can error; precreate when absent.

 .constructNodesDataFrame <- function(input, edges) {
+    if (!"HgncName" %in% colnames(input)) input$HgncName <- NA_character_
+    if (!"Site" %in% colnames(input)) input$Site <- NA_character_
     nodes = input[, c("Protein", "log2FC", "adj.pvalue", "HgncName", "Site")]
     colnames(nodes) = c("id", "logFC", "adj.pvalue", "hgncName", "Site")
     
     nodes = nodes[nodes$id %in% c(edges$source, edges$target), ]
     nodes$hgncName = ifelse(is.na(nodes$hgncName), nodes$id, nodes$hgncName)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
nodes = input[, c("Protein", "log2FC", "adj.pvalue", "HgncName", "Site")]
colnames(nodes) = c("id", "logFC", "adj.pvalue", "hgncName", "Site")
# Add attributes from input where available
nodes$logFC <- input$log2FC[match(nodes$id, input$Protein)]
nodes$adj.pvalue <- input$adj.pvalue[match(nodes$id, input$Protein)]
nodes$hgncName <- if ("HgncName" %in% colnames(input) && is.character(input$HgncName)) {
hgnc_value <- input$HgncName[match(nodes$id, input$Protein)]
ifelse(is.na(hgnc_value), nodes$id, hgnc_value)
} else {
nodes$id
}
nodes = nodes[nodes$id %in% c(edges$source, edges$target), ]
nodes$hgncName = ifelse(is.na(nodes$hgncName), nodes$id, nodes$hgncName)
#
if (!"HgncName" %in% colnames(input)) input$HgncName <- NA_character_
if (!"Site" %in% colnames(input)) input$Site <- NA_character_
nodes = input[, c("Protein", "log2FC", "adj.pvalue", "HgncName", "Site")]
colnames(nodes) = c("id", "logFC", "adj.pvalue", "hgncName", "Site")
nodes = nodes[nodes$id %in% c(edges$source, edges$target), ]
nodes$hgncName = ifelse(is.na(nodes$hgncName), nodes$id, nodes$hgncName)
#
🤖 Prompt for AI Agents
In R/utils_getSubnetworkFromIndra.R around lines 289 to 294, the code assumes
columns "HgncName" and "Site" exist when subsetting input which can cause
errors; before subsetting, check if those columns are present and if not add
them to input (e.g. create input$HgncName <- NA and input$Site <- NA), then
subset using only the required columns (or use intersect to guard against
missing ones), rename to id/logFC/adj.pvalue/hgncName/Site, filter nodes by
edges, and keep the existing fallback that sets hgncName to id when NA.

@Vitek-Lab Vitek-Lab deleted a comment from coderabbitai Bot Sep 11, 2025
@Vitek-Lab Vitek-Lab deleted a comment from coderabbitai Bot Sep 11, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
R/annotateProteinInfoFromIndra.R (1)

58-65: Harden GlobalProtein selection and avoid over-stripping with a safer PTM regex.

  • Validate existing GlobalProtein (NA/empty) and fall back to Protein when unusable. This echoes the prior review—adding the guard prevents silent propagation of bad IDs.
  • The current pattern can wrongly truncate tokens like GENE_A0A024RBG1. Anchor to PTM-like tokens at the end (e.g., _S123 or _S123;T45). Also drop NAs before building protein_ids.

Apply:

-        if ("GlobalProtein" %in% colnames(df)) {
-            protein_ids = unique(df$GlobalProtein)
-        } else {
-            df$GlobalProtein = ifelse(grepl("_[A-Z][0-9]", df$Protein),
-                                 gsub("_[A-Z][0-9].*", "", df$Protein, perl = TRUE),
-                                 df$Protein)
-            protein_ids = unique(df$GlobalProtein)
-        }
+        if ("GlobalProtein" %in% colnames(df)) {
+            bad <- is.na(df$GlobalProtein) | df$GlobalProtein == ""
+            if (all(bad)) {
+                warning("GlobalProtein column is empty; deriving from Protein.")
+                df$GlobalProtein <- df$Protein
+            }
+        } else {
+            # Strip trailing PTM site tokens (e.g., _S123 or _S123;T45), keep mnemonic suffixes like _HUMAN
+            df$GlobalProtein <- gsub("_(?:[A-Z]{1,3})[0-9]+(?:;[A-Z]{1,3}[0-9]+)*$", "", df$Protein, perl = TRUE)
+        }
+        protein_ids <- unique(na.omit(df$GlobalProtein))
🧹 Nitpick comments (2)
R/annotateProteinInfoFromIndra.R (2)

72-73: Skip NA/empty mnemonic IDs before calling the API.

Avoids unnecessary calls and edge-case warnings.

-                mnemonicProteins <- protein_ids
+                mnemonicProteins <- protein_ids[!is.na(protein_ids) & nzchar(protein_ids)]

75-79: Vectorize the mnemonic→UniProt assignment.

Removes the R-level loop and simplifies logic.

-                        for (mnemonicId in names(uniprotMapping)) {
-                                if (!is.null(uniprotMapping[[mnemonicId]])) {
-                                        df$UniprotId[df$GlobalProtein == mnemonicId] <- uniprotMapping[[mnemonicId]]
-                                }
-                        }
+                        map <- unlist(uniprotMapping, use.names = TRUE)
+                        hit <- match(df$GlobalProtein, names(map))
+                        df$UniprotId[!is.na(hit)] <- map[hit[!is.na(hit)]]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c206517 and f790748.

📒 Files selected for processing (1)
  • R/annotateProteinInfoFromIndra.R (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pkgdown
  • GitHub Check: build
  • GitHub Check: test-coverage

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
DESCRIPTION (2)

39-40: Prefer HTTPS for the project homepage URL.

If supported, switch to HTTPS.

-URL: http://msstats.org, https://vitek-lab.github.io/MSstatsBioNet/
+URL: https://msstats.org, https://vitek-lab.github.io/MSstatsBioNet/

21-26: Consider minimum-version pins for key Imports.

If you rely on recent APIs (e.g., tidyr 1.3+ helpers, RCy3 newer functions), add minimal versions in Imports to avoid CI/user mismatches.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f790748 and c8f53d1.

📒 Files selected for processing (4)
  • DESCRIPTION (1 hunks)
  • NAMESPACE (0 hunks)
  • R/getPathwaysFromIndra.R (0 hunks)
  • man/getPathwaysFromIndra.Rd (0 hunks)
💤 Files with no reviewable changes (3)
  • man/getPathwaysFromIndra.Rd
  • NAMESPACE
  • R/getPathwaysFromIndra.R
🔇 Additional comments (2)
DESCRIPTION (2)

22-26: Good cleanup removing MASS from Imports.

Reduces dependency footprint; tidyr trailing-comma fix looks correct.


22-26: Verified — no remaining MASS usage.

rg scan of R/ and vignettes found no fitdistr/MASS::/library(MASS)/require(MASS); NAMESPACE contains no MASS; DESCRIPTION lists Imports/Suggests but not MASS.

@Vitek-Lab Vitek-Lab deleted a comment from coderabbitai Bot Sep 11, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
vignettes/PTM-Analysis.Rmd (3)

37-41: Fail fast if the example data file is missing.

Use system.file(..., mustWork = TRUE) and avoid nested calls for clarity.

-input = data.table::fread(system.file(
-    "extdata/garrido-2024.csv",
-    package = "MSstatsBioNet"
-))
+fpath <- system.file("extdata", "garrido-2024.csv",
+                     package = "MSstatsBioNet", mustWork = TRUE)
+input <- data.table::fread(fpath)
 head(input)

50-52: Copyedit: standardize entity names (UniProt, HGNC).

Small wording/casing fix.

-In the below example, we convert uniprot IDs to their corresponding Hgnc IDs. We
-can also extract other information, such as hgnc gene name and protein function.
+In the example below, we convert UniProt accessions to their corresponding HGNC IDs. We
+can also extract other information, such as HGNC gene symbol and protein function.

61-64: Copyedit: tighten phrasing.

-subnetwork of proteins from the INDRA database based on differential abundance
-analysis results.  This function may help finding off target subnetworks.  
+subnetwork of proteins from the INDRA database based on differential abundance
+analysis results. This function may help find off-target subnetworks.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8f53d1 and 60196fd.

⛔ Files ignored due to path filters (1)
  • inst/extdata/garrido-2024.csv is excluded by !**/*.csv
📒 Files selected for processing (1)
  • vignettes/PTM-Analysis.Rmd (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: test-coverage
🔇 Additional comments (1)
vignettes/PTM-Analysis.Rmd (1)

53-57: Use parameter proteinIdType and the accepted values "Uniprot" | "Uniprot_Mnemonic"

annotateProteinInfoFromIndra expects the second arg named proteinIdType (see R/annotateProteinInfoFromIndra.R) and the implementation checks for "Uniprot" or "Uniprot_Mnemonic" — update the vignette to:
annotated_df <- annotateProteinInfoFromIndra(input, proteinIdType = "Uniprot")
Also harmonize doc/comments that use "UniProt" vs "Uniprot".

Likely an incorrect or invalid review comment.

@Vitek-Lab Vitek-Lab deleted a comment from coderabbitai Bot Sep 11, 2025
@Vitek-Lab Vitek-Lab deleted a comment from coderabbitai Bot Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants