Skip to content

Refactor PTM visualization#69

Merged
tonywu1999 merged 4 commits intodevelfrom
refactor-ptm
Feb 25, 2026
Merged

Refactor PTM visualization#69
tonywu1999 merged 4 commits intodevelfrom
refactor-ptm

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented Feb 25, 2026

Motivation and Context

The PR refactors PTM (post-translational modification) visualization in the network visualization module to represent PTM sites as discrete visual elements in Cytoscape graphs. The goal is clearer depiction of PTM sites as child nodes attached to protein parents (with invisible compound containers), plus edge-level PTM overlap tooltips, and safe embedding of data into generated JavaScript.

Detailed Changes

  • Node generation / structure

    • createNodeElements now:
      • emits invisible compound container nodes for proteins that have PTM Site entries (compound id: "compound")
      • emits protein nodes optionally assigned to the compound via parent field
      • emits per-PTM-site child nodes (id: "ptm") with parent_protein and parent (compound) fields and node_type = 'ptm'
      • emits PTM attachment edges (id: "ptm_edge") with edge_type = 'ptm_attachment' and category = 'ptm_attachment'
    • pre-computes which protein ids have PTM Site rows to avoid redundant compound emission
  • PTM overlap detection and edge consolidation

    • Added .calculatePTMOverlapAggregated(edges, nodes) to aggregate overlapping PTM site names per consolidated edge_key (source-target-interaction)
    • consolidateEdges(edges, nodes = NULL) now accepts nodes, invokes PTM overlap aggregation when nodes provided, and propagates ptm_overlap text into consolidated edge rows
  • Edge generation and styling

    • createEdgeElements now incorporates consolidated ptm_overlap into an escaped tooltip field and includes styling via getEdgeStyle
    • Edge payloads include edge_type, category, tooltip, color, line_style, arrow_shape, width
  • Cytoscape / JS safety and runtime behavior

    • Added escape_js_string() helper to escape backslashes, single quotes, CR/LF for safe embedding into single-quoted JavaScript literals
    • JavaScript generation (via generateCytoscapeConfig path) updated to expect new node/edge payload fields (parent, parent_protein, compound ids, ptm overlap tooltip). (JS positioning code referenced in diff summary — PTM repositioning routine added to run after layoutstop to arrange PTM child nodes in a bottom-arc around parent nodes.)
  • Miscellaneous

    • getRelationshipProperties extended to keep PTM-relevant configs untouched; getEdgeStyle reused for new edge_type values
    • createNodeElements and createEdgeElements ensure escaping of strings when embedding into JS element literals

Unit Tests Added or Modified

  • No new tests targeting PTM-specific functionality were added.
  • Existing tests (tests/testthat/test-visualizeNetworksWithHTML.R) cover:
    • mapLogFCToColor, getRelationshipProperties, consolidateEdges (general consolidation behavior), getEdgeStyle, createNodeElements (basic node emission without PTM coverage), createEdgeElements (basic edge emission/assert fields), generateCytoscapeConfig, and style/layout conversion helpers.
  • Missing tests (not present in current test suite):
    • Compound node creation and correct parent assignment for protein nodes
    • Emission of PTM child nodes and PTM attachment edges (IDs, parent_protein, node_type, category)
    • .calculatePTMOverlapAggregated aggregation correctness for various delimiter formats and multi-row node Site values
    • Propagation of ptm_overlap into consolidated edges and escaping in tooltip text
    • escape_js_string correctness across backslashes, single quotes, CR/LF, and empty/null inputs
    • Post-layout JavaScript PTM repositioning logic and resulting coordinates / non-overlap behavior

Coding Guideline Violations / Risks

  • Test coverage gap (significant): Complex PTM rendering logic (hierarchical compound nodes, PTM child emission, PTM-attachment edges, overlap aggregation, and JS escape/tooltip behavior) lacks direct unit tests. Codecov indicates patch coverage ~74.66% with 37 changed lines untested and identifies R/visualizeNetworksWithHTML.R as missing coverage lines.
  • Maintainability risk: Non-trivial layout/reposition algorithm (JS added to run on layoutstop) and tooltip wiring are not validated by automated tests, increasing risk of regressions.
  • Documentation gap (minor): Internal aggregation and PTM-edge semantics would benefit from clearer docstrings or comments describing expected Site delimiters and edge_key matching assumptions.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 25, 2026

📝 Walkthrough

Walkthrough

Adds PTM-aware rendering to the HTML network exporter: emits compound containers, protein and PTM child nodes, ptm_attachment edges with tooltip data, PTM-specific Cytoscape styles, and JS that repositions PTM nodes around their parent after layout.

Changes

Cohort / File(s) Summary
Core PTM generation & payloads
R/visualizeNetworksWithHTML.R
Pre-computes PTM sites, emits invisible compound containers, per-protein nodes (with optional parent compound linkage), PTM child nodes and ptm_attachment edges; includes parent_protein and compound_id in node data and ptm_overlap in edge data.
Cytoscape styles & config
R/visualizeNetworksWithHTML.R
Adds node_type = 'ptm' styling, invisible compound styling, and a new 'ptm_attachment' edge style (dotted); expands generateCytoscapeConfig payloads to include PTM-related fields.
JavaScript generation & utilities
R/visualizeNetworksWithHTML.R
Introduces escape_js_string helper, embeds PTM tooltip text safely, emits JS to reposition PTM nodes post-layout in a bottom-arc distribution around parent proteins.
Layout/positioning logic
R/visualizeNetworksWithHTML.R
Post-layout reposition routine executed after layoutstop to distribute multiple PTMs around parent protein; computes angles and updates positions via Cytoscape JS.

Sequence Diagram

sequenceDiagram
    participant R as Data Processor (R)
    participant Config as Cytoscape Config (JSON)
    participant Browser as Render Engine (JS)
    participant Layout as Layout Engine (Cytoscape)

    R->>Config: Emit protein nodes, compound containers, PTM nodes, edges (with ptm metadata)
    Config->>Browser: Load nodes/edges and style rules
    Browser->>Layout: Run Cytoscape layout
    Layout->>Browser: Fire layoutstop event
    Browser->>Browser: Compute arc positions for PTMs around parent protein
    Browser->>Layout: Update PTM node positions (position override)
    Browser->>Browser: Render final network with positioned PTMs
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

Review effort 3/5

Poem

🐇 In code I hop where PTMs hide,

I tuck them round their protein side,
Invisible tubs and dotted ties,
Arcs of sparkle beneath the skies,
A tiny network, neatly styled!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is entirely missing—only an empty template was provided with no content filled in for Motivation, Changes, Testing, or checklist items. Provide a complete description covering the motivation for PTM visualization refactoring, detailed list of changes made, testing performed, and mark the pre-review checklist items as complete.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: refactoring of PTM visualization logic within the R file, though it lacks specific detail about the scope of improvements.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor-ptm

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 25, 2026

Codecov Report

❌ Patch coverage is 72.54902% with 42 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.91%. Comparing base (7ff5f09) to head (c5d4a50).

Files with missing lines Patch % Lines
R/visualizeNetworksWithHTML.R 72.54% 42 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel      #69      +/-   ##
==========================================
+ Coverage   58.76%   59.91%   +1.14%     
==========================================
  Files           7        7              
  Lines        1414     1559     +145     
==========================================
+ Hits          831      934     +103     
- Misses        583      625      +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/visualizeNetworksWithHTML.R`:
- Around line 311-315: Duplicate PTM node/edge IDs are being produced when
constructing IDs from protein + site (variables node_elements, ptm_elements,
emitted_proteins, emitted_compounds); before appending any PTM-related node or
edge, compute the PTM id (e.g., paste protein and site) and check it against a
new deduplication set (e.g., emitted_ptm_ids) and only append if not already
present, adding the id to emitted_ptm_ids after append; apply the same guard
where PTM edges are created (the code paths that push to ptm_elements and
node_elements and where emitted_proteins/emitted_compounds are updated) to
eliminate duplicates across rows and within rows.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ff5f09 and 4ff6f5d.

📒 Files selected for processing (1)
  • R/visualizeNetworksWithHTML.R

Comment thread R/visualizeNetworksWithHTML.R
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/visualizeNetworksWithHTML.R`:
- Around line 311-415: The review asks for regression tests covering the new PTM
emission and layout branches in visualizeNetworksWithHTML.R: add testthat cases
that exercise has_ptm_sites / needs_compound logic and deduplication by
emitted_compounds, emitted_ptm_nodes, and emitted_ptm_edges; specifically create
tests for (1) repeated identical sites within one row, (2) identical sites
across multiple rows for the same protein id, (3) rows with multiple distinct
sites, and (4) multiple PTM siblings to verify compound parent assignment and
unique PTM node/edge ids; call the function (the wrapper that returns
node_elements/ptm_elements) with crafted nodes data frames and assert the
returned vector contains the expected compound node id
(paste0(id,'__compound__')), unique ptm node ids (paste0(id,'__ptm__',site)),
and single attachment edges per site, and add these tests to testthat suite so
lines covered in has_ptm_sites, the for-loop PTM emission, and dedupe branches
are exercised.
- Around line 707-709: The selector construction using string interpolation of
parentId is unsafe; instead select candidate nodes and filter by their data
value to avoid selector-special-character issues: replace the
cy.nodes('[parent_protein = "' + parentId + '"]') usage with a safer approach
that first grabs nodes (e.g., cy.nodes() or cy.nodes('[parent_protein]')) and
then .filter(...) comparing node.data('parent_protein') === parentId to produce
siblings, keeping the subsequent idx = siblings.indexOf(ptmNode) and total =
siblings.length logic unchanged.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ff6f5d and c5d4a50.

📒 Files selected for processing (1)
  • R/visualizeNetworksWithHTML.R

Comment on lines +311 to 415
node_elements <- c()
ptm_elements <- c()
emitted_proteins <- c()
emitted_compounds <- c()
emitted_ptm_nodes <- c()
emitted_ptm_edges <- c()

# Pre-compute which protein ids have at least one PTM site row,
# so we know upfront whether a compound wrapper is needed
has_ptm_sites <- if ("Site" %in% names(nodes)) {
ids_with_sites <- unique(nodes$id[!is.na(nodes$Site) & trimws(nodes$Site) != ""])
ids_with_sites
} else {
c()
}

for (i in seq_len(nrow(nodes))) {
row <- nodes[i, ]
color <- node_colors[i]
has_site <- "Site" %in% names(nodes) && !is.na(row$Site) && trimws(row$Site) != ""

display_label <- if (label_column == "hgncName" && !is.na(row$hgncName) && row$hgncName != "") {
row$hgncName
} else {
row['id']
row$id
}

paste0("{ data: { id: '", row['id'], "', label: '", display_label, "', color: '", row['color'], "' } }")
})
needs_compound <- row$id %in% has_ptm_sites
compound_id <- paste0(row$id, "__compound__")

# Emit invisible compound container node once per protein that has PTM children
if (needs_compound && !(compound_id %in% emitted_compounds)) {
node_elements <- c(node_elements,
paste0("{ data: { id: '", escape_js_string(compound_id),
"', node_type: 'compound' } }")
)
emitted_compounds <- c(emitted_compounds, compound_id)
}

# Emit protein node once, assigning it to the compound if one exists
if (!(row$id %in% emitted_proteins)) {
parent_field <- if (needs_compound) {
paste0(", parent: '", escape_js_string(compound_id), "'")
} else {
""
}
node_elements <- c(node_elements,
paste0("{ data: { id: '", escape_js_string(row$id),
"', label: '", escape_js_string(display_label),
"', color: '", color,
"', node_type: 'protein'",
parent_field,
" } }")
)
emitted_proteins <- c(emitted_proteins, row$id)
}

# Emit one PTM child node + attachment edge per individual site
if (has_site) {
sites <- trimws(unlist(strsplit(as.character(row$Site), "[_,;|]")))
sites <- unique(sites[sites != ""])

for (site in sites) {
ptm_node_id <- paste0(row$id, "__ptm__", site)
safe_ptm_id <- escape_js_string(ptm_node_id)
safe_parent <- escape_js_string(row$id)
safe_site <- escape_js_string(site)

# PTM node also belongs to the same compound container
if (!(ptm_node_id %in% emitted_ptm_nodes)) {
ptm_elements <- c(ptm_elements,
paste0("{ data: { id: '", safe_ptm_id,
"', label: '", safe_site,
"', color: '", color,
"', parent_protein: '", safe_parent,
"', parent: '", escape_js_string(compound_id), "'",
", node_type: 'ptm' } }")
)
emitted_ptm_nodes <- c(emitted_ptm_nodes, ptm_node_id)
}

ptm_edge_id_raw <- paste0(row$id, "__ptm_edge__", site)
if (!(ptm_edge_id_raw %in% emitted_ptm_edges)) {
ptm_edge_id <- escape_js_string(ptm_edge_id_raw)
ptm_elements <- c(ptm_elements,
paste0("{ data: { id: '", ptm_edge_id,
"', source: '", safe_parent,
"', target: '", safe_ptm_id,
"', edge_type: 'ptm_attachment',",
" category: 'ptm_attachment',",
" interaction: '',",
" color: '", color, "',",
" line_style: 'dotted',",
" arrow_shape: 'none',",
" width: 1.5,",
" tooltip: '' } }")
)
emitted_ptm_edges <- c(emitted_ptm_edges, ptm_edge_id_raw)
}
}
}
}

return(c(node_elements, ptm_elements))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Please add regression tests for the new PTM emission/layout paths.

This PR adds non-trivial branching here (compound wrapping, per-site dedup, and post-layout PTM positioning), and this file still has uncovered changed lines in the patch report. Add targeted tests for repeated sites (within/across rows), multi-site rows, and multi-PTM sibling placement.

I can draft testthat cases for these scenarios if you want.

Also applies to: 693-728

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/visualizeNetworksWithHTML.R` around lines 311 - 415, The review asks for
regression tests covering the new PTM emission and layout branches in
visualizeNetworksWithHTML.R: add testthat cases that exercise has_ptm_sites /
needs_compound logic and deduplication by emitted_compounds, emitted_ptm_nodes,
and emitted_ptm_edges; specifically create tests for (1) repeated identical
sites within one row, (2) identical sites across multiple rows for the same
protein id, (3) rows with multiple distinct sites, and (4) multiple PTM siblings
to verify compound parent assignment and unique PTM node/edge ids; call the
function (the wrapper that returns node_elements/ptm_elements) with crafted
nodes data frames and assert the returned vector contains the expected compound
node id (paste0(id,'__compound__')), unique ptm node ids
(paste0(id,'__ptm__',site)), and single attachment edges per site, and add these
tests to testthat suite so lines covered in has_ptm_sites, the for-loop PTM
emission, and dedupe branches are exercised.

Comment on lines +707 to +709
var siblings = cy.nodes('[parent_protein = \"' + parentId + '\"]');
var idx = siblings.indexOf(ptmNode);
var total = siblings.length;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid interpolating raw parentId into the selector string.

If an ID contains selector-significant characters, sibling lookup can fail or mis-select. Prefer filtering by data value instead of string-building selectors.

Proposed fix
-            var siblings = cy.nodes('[parent_protein = \"' + parentId + '\"]');
+            var siblings = ptmNodes.filter(function(n) {
+                return n.data('parent_protein') === parentId;
+            });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/visualizeNetworksWithHTML.R` around lines 707 - 709, The selector
construction using string interpolation of parentId is unsafe; instead select
candidate nodes and filter by their data value to avoid
selector-special-character issues: replace the cy.nodes('[parent_protein = "' +
parentId + '"]') usage with a safer approach that first grabs nodes (e.g.,
cy.nodes() or cy.nodes('[parent_protein]')) and then .filter(...) comparing
node.data('parent_protein') === parentId to produce siblings, keeping the
subsequent idx = siblings.indexOf(ptmNode) and total = siblings.length logic
unchanged.

@tonywu1999 tonywu1999 merged commit 0263a37 into devel Feb 25, 2026
4 checks passed
@tonywu1999 tonywu1999 deleted the refactor-ptm branch February 25, 2026 21:58
@coderabbitai coderabbitai Bot mentioned this pull request Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants