Skip to content

Force drug#54

Merged
tonywu1999 merged 2 commits intodevelfrom
force-drug
Aug 8, 2025
Merged

Force drug#54
tonywu1999 merged 2 commits intodevelfrom
force-drug

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented Aug 8, 2025

User description

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • Ran styler::style_pkg(transformers = styler::tidyverse_style(indent_by = 4))
  • Ran devtools::document()

PR Type

Enhancement


Description

  • Add external node inclusion via force_include_other

  • Update INDRA API payload and validation

  • Relax edge metadata mapping with namespaces

  • Revise node construction from edges


Diagram Walkthrough

flowchart LR
  Input["Filtered input + optional external IDs"] -- "validate + prepare" --> Validator["Enhanced input validation"]
  Validator -- "build groundings" --> API["Call INDRA Cogex API"]
  API -- "statements" --> EdgesDF["Construct edges DF"]
  EdgesDF -- "derive unique nodes" --> NodesDF["Construct nodes DF from edges"]
  EdgesDF -- "augment with metadata" --> Meta["Add evidence URLs, IDs with namespaces"]
Loading

File Walkthrough

Relevant files
Enhancement
getPathwaysFromIndra.R
Switch to internal edge metadata helper                                   

R/getPathwaysFromIndra.R

  • Use internal .addAdditionalMetadataToIndraEdge
  • Remove unused namespace parameter
+2/-2     
getSubnetworkFromIndra.R
Support adding non-input nodes to network                               

R/getSubnetworkFromIndra.R

  • Add force_include_other parameter to API
  • Propagate to validator and API caller
  • Update documentation comments
+7/-3     
utils_getSubnetworkFromIndra.R
Validation, API call, metadata, and node construction updates

R/utils_getSubnetworkFromIndra.R

  • Extend validator to count external IDs
  • Update INDRA API payload with external groundings
  • Validate namespace:id format for externals
  • Refactor edge metadata to use source/target namespaces
  • Make uniprot ID mapping robust to missing inputs
  • Build nodes from edge-derived IDs with safe attributes
+44/-29 
Documentation
getSubnetworkFromIndra.Rd
Rd docs updated for external nodes parameter                         

man/getSubnetworkFromIndra.Rd

  • Document new force_include_other parameter
  • Update function signature in docs
+6/-1     
Tests
test-getSubnetworkFromIndra.R
Update tests for API signature change                                       

tests/testthat/test-getSubnetworkFromIndra.R

  • Adjust mocks for new API signature
  • Keep existing behavioral expectations
+2/-2     

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Aug 8, 2025

Walkthrough

The updates introduce a new parameter, force_include_other, to the getSubnetworkFromIndra function and its supporting internals. This parameter allows users to specify additional identifiers to include in the INDRA subnetwork query. Related validation, API calls, metadata assignment, node construction, documentation, and tests were updated to support this feature and improve robustness.

Changes

Cohort / File(s) Change Summary
getSubnetworkFromIndra: New Parameter & Logic
R/getSubnetworkFromIndra.R, R/utils_getSubnetworkFromIndra.R, man/getSubnetworkFromIndra.Rd
Added force_include_other parameter to getSubnetworkFromIndra and related internal functions. Updated validation, API call, and documentation to support inclusion of arbitrary identifiers in the subnetwork. Improved edge metadata assignment and node data frame construction for robustness.
Edge Metadata Assignment
R/getPathwaysFromIndra.R, R/utils_getSubnetworkFromIndra.R
Refactored .addAdditionalMetadataToIndraEdge to remove the namespace argument and related logic. Now uses edge object namespaces directly and assigns fallback values for missing mappings.
Test Adaptation
tests/testthat/test-getSubnetworkFromIndra.R
Updated mocks for .callIndraCogexApi to accept two parameters, aligning with the new function signature. No changes to test logic or expectations.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant getSubnetworkFromIndra
    participant .validateGetSubnetworkFromIndraInput
    participant .callIndraCogexApi
    participant INDRA_API

    User->>getSubnetworkFromIndra: Call with input, force_include_other
    getSubnetworkFromIndra->>.validateGetSubnetworkFromIndraInput: Validate input, force_include_other
    getSubnetworkFromIndra->>.callIndraCogexApi: Pass hgncIds, force_include_other
    .callIndraCogexApi->>INDRA_API: Query with hgncIds + force_include_other
    INDRA_API-->>.callIndraCogexApi: Return subnetwork data
    .callIndraCogexApi-->>getSubnetworkFromIndra: Return edges/nodes
    getSubnetworkFromIndra-->>User: Return subnetwork
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit hops through code anew,
With “force_include_other” in its view.
Now networks grow with extra flair,
Embracing nodes from everywhere!
Docs and tests keep pace in stride—
Robustness blooms on every side.
🐇✨

Note

🔌 MCP (Model Context Protocol) integration is now available in Early Access!

Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch force-drug

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Aug 8, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The input validation now uses the combined count of input rows plus length of force_include_other to enforce the "< 400 proteins" limit, but the error message still refers specifically to proteins. Also, non-HGNC identifiers in force_include_other are counted toward this protein limit, which may be unintended. Confirm intended behavior and update message/limit accordingly.

num_proteins = nrow(input) + ifelse(!is.null(force_include_other), length(force_include_other), 0)
if (num_proteins >= 400) {
    stop("Invalid Input Error: INDRA query must contain less than 400 proteins.  Consider lowering your p-value cutoff")
}
Robustness

.addAdditionalMetadataToIndraEdge sets evidence_list using edge$source_ns and edge$target_ns. If these fields are missing, the URL will include NA, producing invalid links. Consider defaulting to @HGNC when namespace is absent, or guard with if (!is.null(...)).

edge$evidence_list <- paste(
    "https://db.indra.bio/statements/from_agents?subject=",
    edge$source_id, "@", edge$source_ns, "&object=",
    edge$target_id, "@", edge$target_ns, "&format=html",
    sep = ""
)
Data Mapping

Node construction now derives nodes from edges and maps attributes by matching nodes$id (edge IDs) to input$Protein. If edge IDs are HGNC (or other namespaces) while input$Protein are UniProt, logFC and adj.pvalue will be NA for many nodes. Verify that edges$source/target use the same identifier space as input$Protein, or add mapping using the new source_uniprot_id/target_uniprot_id.

# Get unique nodes from edges
node_ids <- unique(c(edges$source, edges$target))

# Create base nodes dataframe
nodes <- data.frame(
    id = node_ids,
    stringsAsFactors = FALSE
)

# Add attributes from input where available
nodes$logFC <- input$log2FC[match(nodes$id, input$Protein)]
nodes$adj.pvalue <- input$adj.pvalue[match(nodes$id, input$Protein)]
nodes$hgncName <- if ("HgncName" %in% colnames(input) && is.character(input$HgncName)) {
    hgnc_value <- input$HgncName[match(nodes$id, input$Protein)]
    ifelse(is.na(hgnc_value), nodes$id, hgnc_value)
} else {
    nodes$id
}

return(nodes)

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 61.76471% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 32.23%. Comparing base (dde02ce) to head (2428eab).

Files with missing lines Patch % Lines
R/utils_getSubnetworkFromIndra.R 63.33% 11 Missing ⚠️
R/getPathwaysFromIndra.R 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel      #54      +/-   ##
==========================================
+ Coverage   32.08%   32.23%   +0.14%     
==========================================
  Files           8        8              
  Lines        1206     1213       +7     
==========================================
+ Hits          387      391       +4     
- Misses        819      822       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Aug 8, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Harden identifier parsing

strsplit without fixed=TRUE and unspecified trimming can misparse inputs containing
regex metacharacters or whitespace. Normalize and validate parts strictly, and use
fixed=TRUE to avoid regex pitfalls.

R/utils_getSubnetworkFromIndra.R [40-53]

 .callIndraCogexApi <- function(hgncIds, force_include_other) {
-    indraCogexUrl <-
-        "https://discovery.indra.bio/api/indra_subnetwork_relations"
+    indraCogexUrl <- "https://discovery.indra.bio/api/indra_subnetwork_relations"
 
     groundings <- lapply(hgncIds, function(x) list("HGNC", x))
-    if (!is.null(force_include_other)) {
-        groundings <- c(groundings, lapply(force_include_other, function(x) {
-            parts <- unlist(strsplit(x, ":"))
-            if (length(parts) != 2) {
-                stop(paste0("Invalid identifier format: ", x, ". Expected format is 'namespace:identifier', e.g. 'HGNC:1234' or 'CHEBI:4911'."))
+    if (!is.null(force_include_other) && length(force_include_other) > 0) {
+        extra <- lapply(force_include_other, function(x) {
+            x <- trimws(x)
+            parts <- strsplit(x, ":", fixed = TRUE)[[1]]
+            if (length(parts) != 2 || any(nchar(parts) == 0)) {
+                stop(paste0("Invalid identifier format: ", x, ". Expected 'namespace:identifier', e.g. 'HGNC:1234' or 'CHEBI:4911'."))
             }
             list(parts[1], parts[2])
-        }))
+        })
+        groundings <- c(groundings, extra)
     }
     groundings <- list(nodes = groundings)
     groundings <- jsonlite::toJSON(groundings, auto_unbox = TRUE)
     ...
 }
Suggestion importance[1-10]: 7

__

Why: Improves robustness by trimming, using fixed splitting, and stricter validation when parsing force_include_other. This directly applies to the new parsing code and reduces potential runtime errors; moderate importance.

Medium
Count only HGNC toward cap

The 400-item limit now counts force_include_other blindly, but those entries may not
be proteins. This can prematurely reject valid queries or exceed API constraints.
Restrict the cap to HGNC-only entries and optionally warn if additional non-HGNC
nodes exist.

R/utils_getSubnetworkFromIndra.R [9-16]

 .validateGetSubnetworkFromIndraInput <- function(input, protein_level_data, sources_filter, force_include_other) {
     if (!"HgncId" %in% colnames(input)) {
         stop("Invalid Input Error: Input must contain a column named 'HgncId'.")
     }
-    num_proteins = nrow(input) + ifelse(!is.null(force_include_other), length(force_include_other), 0)
+    # Count only HGNC proteins toward the 400-protein INDRA cap
+    extra_hgnc <- 0L
+    if (!is.null(force_include_other) && length(force_include_other) > 0) {
+        parts <- strsplit(force_include_other, ":", fixed = TRUE)
+        ns <- vapply(parts, function(p) if (length(p) == 2) p[1] else NA_character_, character(1))
+        extra_hgnc <- sum(ns == "HGNC", na.rm = TRUE)
+    }
+    num_proteins <- nrow(input) + extra_hgnc
     if (num_proteins >= 400) {
-        stop("Invalid Input Error: INDRA query must contain less than 400 proteins.  Consider lowering your p-value cutoff")
+        stop("Invalid Input Error: INDRA query must contain less than 400 HGNC proteins. Consider lowering your p-value cutoff.")
     }
     if (nrow(input) == 0) {
         stop("Invalid Input Error: Input must contain at least one protein after filtering.")
     }
-    ...
+    if (!is.null(protein_level_data)) {
+        if (!all(c("Protein", "LogIntensities", "originalRUN") %in% colnames(protein_level_data))) {
+            stop("protein_level_data must contain 'Protein', 'LogIntensities', and 'originalRUN' columns.")
+        }
+    }
+    if (!is.null(sources_filter)) {
+        if (!is.character(sources_filter)) {
+            stop("sources_filter must be a character vector")
+        }
+    }
 }
Suggestion importance[1-10]: 6

__

Why: Correctly notes that the new cap counts all force_include_other, which can include non-HGNC IDs; proposing to count only HGNC is reasonable and reduces false rejections. Impact is moderate and logic aligns with surrounding validation, though it's not a critical bug fix.

Low
General
URL-encode and set defaults

The evidence URL assumes non-NA source_ns/target_ns and raw IDs; missing values or
special characters will yield invalid URLs. Default namespaces and URL-encode all
query parameters before concatenation.

R/utils_getSubnetworkFromIndra.R [152-176]

 .addAdditionalMetadataToIndraEdge <- function(edge, input) {
-    edge$evidence_list <- paste(
+    src_ns <- if (!is.null(edge$source_ns) && !is.na(edge$source_ns)) edge$source_ns else "HGNC"
+    tgt_ns <- if (!is.null(edge$target_ns) && !is.na(edge$target_ns)) edge$target_ns else "HGNC"
+    esc <- utils::URLencode
+    edge$evidence_list <- paste0(
         "https://db.indra.bio/statements/from_agents?subject=",
-        edge$source_id, "@", edge$source_ns, "&object=",
-        edge$target_id, "@", edge$target_ns, "&format=html",
-        sep = ""
+        esc(edge$source_id), "@", esc(src_ns),
+        "&object=", esc(edge$target_id), "@", esc(tgt_ns),
+        "&format=html"
     )
     
-    # Convert back to uniprot IDs
     matched_rows_source <- input[which(input$HgncId == edge$source_id), ]
     if (nrow(matched_rows_source) != 1) {
         edge$source_uniprot_id <- edge$source_name
     } else {
         edge$source_uniprot_id <- matched_rows_source$Protein
     }
     
     matched_rows_target <- input[which(input$HgncId == edge$target_id), ]
     if (nrow(matched_rows_target) != 1) {
         edge$target_uniprot_id <- edge$target_name
     } else {
         edge$target_uniprot_id <- matched_rows_target$Protein
     }
-    
     return(edge)
 }
Suggestion importance[1-10]: 7

__

Why: Encoding query parts and defaulting namespaces makes the evidence URL construction safer against NA and special characters. It aligns with the new URL format and enhances reliability without changing behavior otherwise.

Medium

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
R/utils_getSubnetworkFromIndra.R (2)

13-16: Duplicate-count risk in protein total calculation

num_proteins blindly adds length(force_include_other) to nrow(input).
If any identifier in force_include_other corresponds to an HGNC ID already present in input, the same entity is double-counted and can spuriously trip the <400 limit.

-    num_proteins = nrow(input) + ifelse(!is.null(force_include_other), length(force_include_other), 0)
+    unique_ids <- unique(c(input$HgncId,
+                           sub("^.+:", "", force_include_other %||% character(0))))
+    num_proteins <- length(unique_ids)

Using unique() avoids false “too many proteins” errors.


265-283: Minor: unnecessary temporary hgnc_value

The local assignment isn’t reused outside the one-liner; you can inline it to avoid the extra object:

nodes$hgncName <- if ("HgncName" %in% colnames(input) &&
                      is.character(input$HgncName)) {
    ifelse(is.na(input$HgncName[match(nodes$id, input$Protein)]),
           nodes$id,
           input$HgncName[match(nodes$id, input$Protein)])
} else {
    nodes$id
}
tests/testthat/test-getSubnetworkFromIndra.R (1)

5-6: Mock signature is brittle

Hard-coding two positional arguments ties the tests to the exact arity of .callIndraCogexApi.
Using ... makes the mock future-proof:

local_mocked_bindings(.callIndraCogexApi = function(...) {
    readRDS(system.file("extdata/indraResponse.rds", package = "MSstatsBioNet"))
})

Also applies to: 17-18

R/getPathwaysFromIndra.R (1)

110-112: Stale variable after signature change

namespace is now unused after removing the third argument to .addAdditionalMetadataToIndraEdge.
Consider deleting the earlier calculation (Lines 41–50) or repurposing it, to avoid dead code and reader confusion.

man/getSubnetworkFromIndra.Rd (1)

59-61: Doc tweak – clarify accepted namespaces

Add an explicit note that only namespaces accepted by the INDRA CogEx API are valid; otherwise the call will error.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dde02ce and 2428eab.

📒 Files selected for processing (5)
  • R/getPathwaysFromIndra.R (1 hunks)
  • R/getSubnetworkFromIndra.R (2 hunks)
  • R/utils_getSubnetworkFromIndra.R (4 hunks)
  • man/getSubnetworkFromIndra.Rd (2 hunks)
  • tests/testthat/test-getSubnetworkFromIndra.R (2 hunks)
🔇 Additional comments (2)
R/utils_getSubnetworkFromIndra.R (1)

162-173: edge$*_name may be undefined

When no matching HGNC row exists you fall back to edge$source_name / edge$target_name, but these fields are not guaranteed to be present in every INDRA statement object.
Accessing a missing element yields NULL, breaking later data.frame construction that expects character scalars.

Safest option is to default to the raw HGNC ID:

edge$source_uniprot_id <- matched_rows_source$Protein %||%
                          edge$source_name %||% edge$source_id
R/getSubnetworkFromIndra.R (1)

62-64: Call chain still omits force_include_proteins duplicates

force_include_proteins can re-introduce proteins filtered out earlier, yet you only pass input$HgncId to the API call.
If any forced protein was excluded for missing HgncId, it never reaches INDRA.
Verify that all intended proteins are represented in input$HgncId post-filter or append them explicitly before the API request.

Comment on lines +40 to +53
.callIndraCogexApi <- function(hgncIds, force_include_other) {
indraCogexUrl <-
"https://discovery.indra.bio/api/indra_subnetwork_relations"

groundings <- lapply(hgncIds, function(x) list("HGNC", x))
if (!is.null(force_include_other)) {
groundings <- c(groundings, lapply(force_include_other, function(x) {
parts <- unlist(strsplit(x, ":"))
if (length(parts) != 2) {
stop(paste0("Invalid identifier format: ", x, ". Expected format is 'namespace:identifier', e.g. 'HGNC:1234' or 'CHEBI:4911'."))
}
list(parts[1], parts[2])
}))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Missing HTTP-error handling & fragile string-split

  1. POST() result is consumed without checking status_code, so 4xx/5xx responses silently become cryptic list structures.
  2. strsplit(x, ":") fails on NA or strings containing additional ‘:’, and uses regex.
    Use fixed = TRUE and trimws() for robustness.
res <- POST(...)

if (httr::http_error(res))
    stop("INDRA API request failed: ", httr::status_code(res), "", httr::http_status(res)$message)
...
parts <- strsplit(trimws(x), ":", fixed = TRUE)[[1]]
🤖 Prompt for AI Agents
In R/utils_getSubnetworkFromIndra.R around lines 40 to 53, the POST request
result is used without checking for HTTP errors, which can cause silent failures
on 4xx/5xx responses. Add a check after the POST call using httr::http_error to
detect errors and stop execution with a clear message including the status code
and status message. Also, improve the string splitting by using strsplit with
fixed = TRUE and apply trimws to the input string before splitting to handle NA
and extra colons robustly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants