Refactor PTM visualization#69
Conversation
📝 WalkthroughWalkthroughAdds PTM-aware rendering to the HTML network exporter: emits compound containers, protein and PTM child nodes, ptm_attachment edges with tooltip data, PTM-specific Cytoscape styles, and JS that repositions PTM nodes around their parent after layout. Changes
Sequence DiagramsequenceDiagram
participant R as Data Processor (R)
participant Config as Cytoscape Config (JSON)
participant Browser as Render Engine (JS)
participant Layout as Layout Engine (Cytoscape)
R->>Config: Emit protein nodes, compound containers, PTM nodes, edges (with ptm metadata)
Config->>Browser: Load nodes/edges and style rules
Browser->>Layout: Run Cytoscape layout
Layout->>Browser: Fire layoutstop event
Browser->>Browser: Compute arc positions for PTMs around parent protein
Browser->>Layout: Update PTM node positions (position override)
Browser->>Browser: Render final network with positioned PTMs
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Failed to generate code suggestions for PR |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #69 +/- ##
==========================================
+ Coverage 58.76% 59.91% +1.14%
==========================================
Files 7 7
Lines 1414 1559 +145
==========================================
+ Hits 831 934 +103
- Misses 583 625 +42 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/visualizeNetworksWithHTML.R`:
- Around line 311-315: Duplicate PTM node/edge IDs are being produced when
constructing IDs from protein + site (variables node_elements, ptm_elements,
emitted_proteins, emitted_compounds); before appending any PTM-related node or
edge, compute the PTM id (e.g., paste protein and site) and check it against a
new deduplication set (e.g., emitted_ptm_ids) and only append if not already
present, adding the id to emitted_ptm_ids after append; apply the same guard
where PTM edges are created (the code paths that push to ptm_elements and
node_elements and where emitted_proteins/emitted_compounds are updated) to
eliminate duplicates across rows and within rows.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@R/visualizeNetworksWithHTML.R`:
- Around line 311-415: The review asks for regression tests covering the new PTM
emission and layout branches in visualizeNetworksWithHTML.R: add testthat cases
that exercise has_ptm_sites / needs_compound logic and deduplication by
emitted_compounds, emitted_ptm_nodes, and emitted_ptm_edges; specifically create
tests for (1) repeated identical sites within one row, (2) identical sites
across multiple rows for the same protein id, (3) rows with multiple distinct
sites, and (4) multiple PTM siblings to verify compound parent assignment and
unique PTM node/edge ids; call the function (the wrapper that returns
node_elements/ptm_elements) with crafted nodes data frames and assert the
returned vector contains the expected compound node id
(paste0(id,'__compound__')), unique ptm node ids (paste0(id,'__ptm__',site)),
and single attachment edges per site, and add these tests to testthat suite so
lines covered in has_ptm_sites, the for-loop PTM emission, and dedupe branches
are exercised.
- Around line 707-709: The selector construction using string interpolation of
parentId is unsafe; instead select candidate nodes and filter by their data
value to avoid selector-special-character issues: replace the
cy.nodes('[parent_protein = "' + parentId + '"]') usage with a safer approach
that first grabs nodes (e.g., cy.nodes() or cy.nodes('[parent_protein]')) and
then .filter(...) comparing node.data('parent_protein') === parentId to produce
siblings, keeping the subsequent idx = siblings.indexOf(ptmNode) and total =
siblings.length logic unchanged.
| node_elements <- c() | ||
| ptm_elements <- c() | ||
| emitted_proteins <- c() | ||
| emitted_compounds <- c() | ||
| emitted_ptm_nodes <- c() | ||
| emitted_ptm_edges <- c() | ||
|
|
||
| # Pre-compute which protein ids have at least one PTM site row, | ||
| # so we know upfront whether a compound wrapper is needed | ||
| has_ptm_sites <- if ("Site" %in% names(nodes)) { | ||
| ids_with_sites <- unique(nodes$id[!is.na(nodes$Site) & trimws(nodes$Site) != ""]) | ||
| ids_with_sites | ||
| } else { | ||
| c() | ||
| } | ||
|
|
||
| for (i in seq_len(nrow(nodes))) { | ||
| row <- nodes[i, ] | ||
| color <- node_colors[i] | ||
| has_site <- "Site" %in% names(nodes) && !is.na(row$Site) && trimws(row$Site) != "" | ||
|
|
||
| display_label <- if (label_column == "hgncName" && !is.na(row$hgncName) && row$hgncName != "") { | ||
| row$hgncName | ||
| } else { | ||
| row['id'] | ||
| row$id | ||
| } | ||
|
|
||
| paste0("{ data: { id: '", row['id'], "', label: '", display_label, "', color: '", row['color'], "' } }") | ||
| }) | ||
| needs_compound <- row$id %in% has_ptm_sites | ||
| compound_id <- paste0(row$id, "__compound__") | ||
|
|
||
| # Emit invisible compound container node once per protein that has PTM children | ||
| if (needs_compound && !(compound_id %in% emitted_compounds)) { | ||
| node_elements <- c(node_elements, | ||
| paste0("{ data: { id: '", escape_js_string(compound_id), | ||
| "', node_type: 'compound' } }") | ||
| ) | ||
| emitted_compounds <- c(emitted_compounds, compound_id) | ||
| } | ||
|
|
||
| # Emit protein node once, assigning it to the compound if one exists | ||
| if (!(row$id %in% emitted_proteins)) { | ||
| parent_field <- if (needs_compound) { | ||
| paste0(", parent: '", escape_js_string(compound_id), "'") | ||
| } else { | ||
| "" | ||
| } | ||
| node_elements <- c(node_elements, | ||
| paste0("{ data: { id: '", escape_js_string(row$id), | ||
| "', label: '", escape_js_string(display_label), | ||
| "', color: '", color, | ||
| "', node_type: 'protein'", | ||
| parent_field, | ||
| " } }") | ||
| ) | ||
| emitted_proteins <- c(emitted_proteins, row$id) | ||
| } | ||
|
|
||
| # Emit one PTM child node + attachment edge per individual site | ||
| if (has_site) { | ||
| sites <- trimws(unlist(strsplit(as.character(row$Site), "[_,;|]"))) | ||
| sites <- unique(sites[sites != ""]) | ||
|
|
||
| for (site in sites) { | ||
| ptm_node_id <- paste0(row$id, "__ptm__", site) | ||
| safe_ptm_id <- escape_js_string(ptm_node_id) | ||
| safe_parent <- escape_js_string(row$id) | ||
| safe_site <- escape_js_string(site) | ||
|
|
||
| # PTM node also belongs to the same compound container | ||
| if (!(ptm_node_id %in% emitted_ptm_nodes)) { | ||
| ptm_elements <- c(ptm_elements, | ||
| paste0("{ data: { id: '", safe_ptm_id, | ||
| "', label: '", safe_site, | ||
| "', color: '", color, | ||
| "', parent_protein: '", safe_parent, | ||
| "', parent: '", escape_js_string(compound_id), "'", | ||
| ", node_type: 'ptm' } }") | ||
| ) | ||
| emitted_ptm_nodes <- c(emitted_ptm_nodes, ptm_node_id) | ||
| } | ||
|
|
||
| ptm_edge_id_raw <- paste0(row$id, "__ptm_edge__", site) | ||
| if (!(ptm_edge_id_raw %in% emitted_ptm_edges)) { | ||
| ptm_edge_id <- escape_js_string(ptm_edge_id_raw) | ||
| ptm_elements <- c(ptm_elements, | ||
| paste0("{ data: { id: '", ptm_edge_id, | ||
| "', source: '", safe_parent, | ||
| "', target: '", safe_ptm_id, | ||
| "', edge_type: 'ptm_attachment',", | ||
| " category: 'ptm_attachment',", | ||
| " interaction: '',", | ||
| " color: '", color, "',", | ||
| " line_style: 'dotted',", | ||
| " arrow_shape: 'none',", | ||
| " width: 1.5,", | ||
| " tooltip: '' } }") | ||
| ) | ||
| emitted_ptm_edges <- c(emitted_ptm_edges, ptm_edge_id_raw) | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return(c(node_elements, ptm_elements)) | ||
| } |
There was a problem hiding this comment.
Please add regression tests for the new PTM emission/layout paths.
This PR adds non-trivial branching here (compound wrapping, per-site dedup, and post-layout PTM positioning), and this file still has uncovered changed lines in the patch report. Add targeted tests for repeated sites (within/across rows), multi-site rows, and multi-PTM sibling placement.
I can draft testthat cases for these scenarios if you want.
Also applies to: 693-728
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/visualizeNetworksWithHTML.R` around lines 311 - 415, The review asks for
regression tests covering the new PTM emission and layout branches in
visualizeNetworksWithHTML.R: add testthat cases that exercise has_ptm_sites /
needs_compound logic and deduplication by emitted_compounds, emitted_ptm_nodes,
and emitted_ptm_edges; specifically create tests for (1) repeated identical
sites within one row, (2) identical sites across multiple rows for the same
protein id, (3) rows with multiple distinct sites, and (4) multiple PTM siblings
to verify compound parent assignment and unique PTM node/edge ids; call the
function (the wrapper that returns node_elements/ptm_elements) with crafted
nodes data frames and assert the returned vector contains the expected compound
node id (paste0(id,'__compound__')), unique ptm node ids
(paste0(id,'__ptm__',site)), and single attachment edges per site, and add these
tests to testthat suite so lines covered in has_ptm_sites, the for-loop PTM
emission, and dedupe branches are exercised.
| var siblings = cy.nodes('[parent_protein = \"' + parentId + '\"]'); | ||
| var idx = siblings.indexOf(ptmNode); | ||
| var total = siblings.length; |
There was a problem hiding this comment.
Avoid interpolating raw parentId into the selector string.
If an ID contains selector-significant characters, sibling lookup can fail or mis-select. Prefer filtering by data value instead of string-building selectors.
Proposed fix
- var siblings = cy.nodes('[parent_protein = \"' + parentId + '\"]');
+ var siblings = ptmNodes.filter(function(n) {
+ return n.data('parent_protein') === parentId;
+ });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@R/visualizeNetworksWithHTML.R` around lines 707 - 709, The selector
construction using string interpolation of parentId is unsafe; instead select
candidate nodes and filter by their data value to avoid
selector-special-character issues: replace the cy.nodes('[parent_protein = "' +
parentId + '"]') usage with a safer approach that first grabs nodes (e.g.,
cy.nodes() or cy.nodes('[parent_protein]')) and then .filter(...) comparing
node.data('parent_protein') === parentId to produce siblings, keeping the
subsequent idx = siblings.indexOf(ptmNode) and total = siblings.length logic
unchanged.
Motivation and Context
The PR refactors PTM (post-translational modification) visualization in the network visualization module to represent PTM sites as discrete visual elements in Cytoscape graphs. The goal is clearer depiction of PTM sites as child nodes attached to protein parents (with invisible compound containers), plus edge-level PTM overlap tooltips, and safe embedding of data into generated JavaScript.
Detailed Changes
Node generation / structure
PTM overlap detection and edge consolidation
Edge generation and styling
Cytoscape / JS safety and runtime behavior
Miscellaneous
Unit Tests Added or Modified
Coding Guideline Violations / Risks