From 0c80ac3d538758d15a417433ff85c3acfa6d84ad Mon Sep 17 00:00:00 2001
From: Hadley Wickham <h.wickham@gmail.com>
Date: Fri, 20 Mar 2026 07:41:12 -0500
Subject: [PATCH 1/7] Skip XML parsing in markdown_evaluate() when no inline R
 code present

Add a fast grepl() pre-check to avoid expensive XML parsing for the
99% of tags that don't contain inline R code (`r ...` or ```{...}).
Saves ~12ms per roxygenize() run (11x faster for markdown_evaluate).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 R/markdown-code.R       |  3 ++
 optimization-plan.md    | 63 +++++++++++++++++++++++++++++++++++++++++
 optimization-results.md | 20 +++++++++++++
 3 files changed, 86 insertions(+)
 create mode 100644 optimization-plan.md
 create mode 100644 optimization-results.md

diff --git a/R/markdown-code.R b/R/markdown-code.R
index 2e9559f5d..3a9f956c9 100644
--- a/R/markdown-code.R
+++ b/R/markdown-code.R
@@ -49,6 +49,9 @@
 #' @keywords internal
 markdown_evaluate <- function(text) {
   text <- paste(text, collapse = "\n")
+  if (!grepl("`r ", text, fixed = TRUE) && !grepl("```{", text, fixed = TRUE)) {
+    return(text)
+  }
   mdxml <- xml_ns_strip(md_to_mdxml(text, sourcepos = TRUE))
   code_nodes <- xml_find_all(mdxml, ".//code | .//code_block")
   rcode_nodes <- keep(code_nodes, is_markdown_code_node)
diff --git a/optimization-plan.md b/optimization-plan.md
new file mode 100644
index 000000000..01322e755
--- /dev/null
+++ b/optimization-plan.md
@@ -0,0 +1,63 @@
+# Optimization plan
+
+## Profiling summary
+
+Profiled `roxygenize(load_code = "source")` on the roxygen2 package itself (5 iterations, ~2.9s total, interval = 5ms). Key bottlenecks by total time:
+
+| Area | % of total | Key functions |
+|------|-----------|---------------|
+| Tag parsing / markdown | 23% | `parse_tags` → `markdown_if_active` → `markdown` |
+| Markdown pass 2 (md→Rd) | 8% | `markdown_pass2` → `mdxml_children_to_rd_top` |
+| CLI warnings | 8.5% | `warn_roxy` → `cli::cli_inform` |
+| S3 generic detection | 12% | `block_find_object` → `is_s3_generic` / `find_generic` |
+| R6 method extraction | 10% | `r6_extract_methods` → `warn_roxy_tag` (76% of its time!) |
+
+## Optimization experiments
+
+### Experiment 1: Cache `is_s3_generic()` results
+
+**Bottleneck:** `find_generic()` calls `is_s3_generic()` in a loop over dot-split pieces of every function name. `is_s3_generic()` does expensive work: `exists()`, `get()`, `tryCatch(getNamespaceName(...))`, `body()`, and recursive `calls_use_method()`. The same generic names (e.g., `print`, `format`, `[`) are tested repeatedly across many functions in a package.
+
+**Proposed fix:** Add a simple environment-based cache inside `is_s3_generic()` keyed by `(name, env)`. Clear the cache at the start of each `roxygenize()` call.
+
+**Risk:** Low — pure function of name + env, env doesn't change during a roxygenize run.
+
+### Experiment 2: Cache `methods::isGeneric()` in `find_generic()`
+
+**Bottleneck:** `find_generic()` calls `methods::isGeneric()` for every function name. This is an S4-method lookup that hits the methods infrastructure.
+
+**Proposed fix:** Cache `methods::isGeneric()` results alongside `is_s3_generic()`.
+
+**Risk:** Low — same reasoning as experiment 1.
+
+### Experiment 3: Replace `cli::cli_inform()` with simpler message in `warn_roxy()`
+
+**Bottleneck:** `warn_roxy()` → `cli::cli_inform()` accounts for ~8.5% of total time. Each call involves `cli::style_hyperlink()`, `cli::format_inline()`, and the full cli formatting pipeline including selector matching. When documenting roxygen2 itself, R6 warnings generate many calls.
+
+**Proposed fix:** Use `cli::cli_inform()` only in interactive sessions or batch the warnings. Alternatively, use `warning()` or `message()` with pre-formatted strings to bypass cli overhead.
+
+**Risk:** Medium — changes user-visible output format. Need to be careful to preserve behavior.
+
+### Experiment 4: Avoid `cli::format_inline()` in `warn_roxy_tag()`
+
+**Bottleneck:** `warn_roxy_tag()` calls `cli::format_inline("{.strong @{tag$tag}} ")` for every warning. This is expensive relative to the simple output it produces.
+
+**Proposed fix:** Replace with `paste0("**@", tag$tag, "** ")` or a simpler string formatting approach, since the bold markup is the only cli feature used.
+
+**Risk:** Low — the formatted tag name is just concatenated into the message anyway.
+
+### Experiment 5: Precompute `internal_f("tools", ".get_internal_S3_generics")()`
+
+**Bottleneck:** In `is_s3_generic()`, for primitive functions, `internal_f("tools", ".get_internal_S3_generics")()` is called each time. This fetches and calls an internal function from the tools package.
+
+**Proposed fix:** Compute this once and cache the result (it never changes within a session).
+
+**Risk:** Very low — the tools package internal generics list is static.
+
+### Experiment 6: Optimize `find_generic()` string splitting
+
+**Bottleneck:** `find_generic()` calls `str_split(name, fixed("."))` for every function name being analyzed, and then `paste0(pieces[seq_len(i)], collapse = ".")` in a loop.
+
+**Proposed fix:** Use `strsplit()` (base R) instead of `str_split()` which has stringr overhead. Pre-compute the candidate generic names more efficiently.
+
+**Risk:** Very low — straightforward string operation replacement.
diff --git a/optimization-results.md b/optimization-results.md
new file mode 100644
index 000000000..6cd6f9cc2
--- /dev/null
+++ b/optimization-results.md
@@ -0,0 +1,20 @@
+# Optimization results
+
+## Experiment 1: Skip XML parsing in `markdown_evaluate()` for text without inline R code
+
+**Hypothesis:** `markdown_evaluate()` parses every tag's text to XML to check for inline R code (`\`r ...\``), but the vast majority of tags don't contain any. A fast `grepl()` pre-check can skip the expensive XML parsing.
+
+**Data:** Of 240 calls to `markdown_evaluate()` during `roxygenize()` on roxygen2, only 2 contain inline R code (238 are pure text).
+
+**Change:** Added early return in `markdown_evaluate()` when text doesn't contain `` `r `` or `` ```{ `` patterns.
+
+**Microbenchmark (240 typical calls, no inline R code):**
+
+| Version | min | median |
+|---------|-----|--------|
+| Before | 13.23ms | 13.47ms |
+| After | 1.01ms | 1.18ms |
+
+**Result:** 11x faster for `markdown_evaluate()`, saving ~12ms per `roxygenize()` run. All 1105 tests pass.
+
+**Verdict:** SUCCESS — committed.

From a54db3e0099615e97139bd9c1873c8f06ea1c5e6 Mon Sep 17 00:00:00 2001
From: Hadley Wickham <h.wickham@gmail.com>
Date: Fri, 20 Mar 2026 07:43:21 -0500
Subject: [PATCH 2/7] Skip Rd tag escaping when text has no backslashes

Add fast grepl() pre-check in escape_rd_for_md() to skip expensive
regex search and string manipulation when text contains no Rd macros.
Saves ~28ms per roxygenize() run (92x faster per call for plain text).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 R/markdown-escaping.R   | 11 +++++++++++
 optimization-results.md | 19 +++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/R/markdown-escaping.R b/R/markdown-escaping.R
index 540234f26..161653443 100644
--- a/R/markdown-escaping.R
+++ b/R/markdown-escaping.R
@@ -20,7 +20,18 @@
 #' * `unescape_rd_for_md`: the original Rd text.
 #' @rdname markdown-internals
 #' @keywords internal
+empty_rd_tags <- data.frame(
+  tag = character(),
+  start = integer(),
+  end = integer(),
+  argend = integer()
+)
+
 escape_rd_for_md <- function(text) {
+  if (!grepl("\\", text, fixed = TRUE)) {
+    attr(text, "roxygen-markdown-subst") <- list(tags = empty_rd_tags, id = "")
+    return(text)
+  }
   rd_tags <- find_fragile_rd_tags(text, escaped_for_md)
   protected <- protect_rd_tags(text, rd_tags)
   double_escape_md(protected)
diff --git a/optimization-results.md b/optimization-results.md
index 6cd6f9cc2..848f88611 100644
--- a/optimization-results.md
+++ b/optimization-results.md
@@ -18,3 +18,22 @@
 **Result:** 11x faster for `markdown_evaluate()`, saving ~12ms per `roxygenize()` run. All 1105 tests pass.
 
 **Verdict:** SUCCESS — committed.
+
+## Experiment 2: Skip Rd tag escaping in `escape_rd_for_md()` for text without backslashes
+
+**Hypothesis:** `escape_rd_for_md()` searches for Rd macros (e.g., `\code{}`, `\link{}`) to protect them from markdown parsing. But most roxygen text doesn't contain any Rd macros. A fast `grepl("\\", text, fixed = TRUE)` check can skip all the work.
+
+**Data:** Of 240 calls during `roxygenize()` on roxygen2, only 4 contain backslashes (236 are plain text).
+
+**Change:** Added early return in `escape_rd_for_md()` when text contains no backslash character, attaching the required empty attribute structure for `unescape_rd_for_md()`.
+
+**Microbenchmark (per call, simple text):**
+
+| Version | min | median |
+|---------|-----|--------|
+| Before | 106µs | 121µs |
+| After | 1.1µs | 1.3µs |
+
+**Result:** 92x faster for simple text, saving ~28ms per `roxygenize()` run (236 × 120µs). All 1105 tests pass.
+
+**Verdict:** SUCCESS — committed.

From 4608a840bbed90ca52e2948f4d1a1207d09e88d0 Mon Sep 17 00:00:00 2001
From: Hadley Wickham <h.wickham@gmail.com>
Date: Fri, 20 Mar 2026 07:48:17 -0500
Subject: [PATCH 3/7] Cache is_s3_generic() results during roxygenize()

Add an environment-based cache for is_s3_generic() that is active only
during roxygenize() runs. With 729 calls and 301 unique names (59%
redundancy), this saves ~3ms per run. Cache is cleared after each run
to avoid cross-environment contamination.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 R/object-s3.R           | 29 +++++++++++++++++++++++++++++
 R/roxygenize.R          |  2 ++
 optimization-results.md | 17 +++++++++++++++++
 3 files changed, 48 insertions(+)

diff --git a/R/object-s3.R b/R/object-s3.R
index 40e566080..1ea589a7f 100644
--- a/R/object-s3.R
+++ b/R/object-s3.R
@@ -1,3 +1,14 @@
+s3_generic_cache <- new_environment()
+
+s3_generic_cache_reset <- function() {
+  env_unbind(s3_generic_cache, env_names(s3_generic_cache))
+  s3_generic_cache$active <- TRUE
+}
+
+s3_generic_cache_clear <- function() {
+  env_unbind(s3_generic_cache, env_names(s3_generic_cache))
+}
+
 #' Determine if a function is an S3 generic or S3 method
 #'
 #' @description
@@ -16,6 +27,24 @@ is_s3_generic <- function(name, env = parent.frame()) {
   if (name == "") {
     return(FALSE)
   }
+
+  if (isTRUE(s3_generic_cache$active)) {
+    cached <- env_get(s3_generic_cache, name, default = NULL)
+    if (!is.null(cached)) {
+      return(cached)
+    }
+  }
+
+  result <- is_s3_generic_impl(name, env)
+
+  if (isTRUE(s3_generic_cache$active)) {
+    env_poke(s3_generic_cache, name, result)
+  }
+
+  result
+}
+
+is_s3_generic_impl <- function(name, env) {
   if (!exists(name, envir = env, mode = "function")) {
     return(FALSE)
   }
diff --git a/R/roxygenize.R b/R/roxygenize.R
index c25571419..4e915faa3 100644
--- a/R/roxygenize.R
+++ b/R/roxygenize.R
@@ -47,6 +47,8 @@ roxygenize <- function(
   is_first <- roxygen_setup(base_path)
 
   find_package_cache_reset()
+  s3_generic_cache_reset()
+  withr::defer(s3_generic_cache_clear())
   roxy_meta_load(base_path)
   # Load required packages for method registration
   packages <- roxy_meta_get("packages")
diff --git a/optimization-results.md b/optimization-results.md
index 848f88611..8c479269c 100644
--- a/optimization-results.md
+++ b/optimization-results.md
@@ -37,3 +37,20 @@
 **Result:** 92x faster for simple text, saving ~28ms per `roxygenize()` run (236 × 120µs). All 1105 tests pass.
 
 **Verdict:** SUCCESS — committed.
+
+## Experiment 3: Cache `is_s3_generic()` results during `roxygenize()`
+
+**Hypothesis:** `is_s3_generic()` is called 729 times during `roxygenize()` on roxygen2, with only 301 unique names (59% redundancy). Each call involves `exists()`, `get()`, `tryCatch(getNamespaceName(...))`, and potentially recursive `calls_use_method()`. Caching results by name avoids redundant work.
+
+**Change:** Added an environment-based cache that is activated during `roxygenize()` and cleared afterwards. The cache is inactive during direct calls (e.g., in tests) to avoid cross-environment contamination.
+
+**Microbenchmark (729 calls matching a real roxygenize run):**
+
+| Version | min | median |
+|---------|-----|--------|
+| Before | 16.4ms | 17.3ms |
+| After | 13.6ms | 14.3ms |
+
+**Result:** ~17% faster for `is_s3_generic()` calls, saving ~3ms per `roxygenize()` run. Savings scale with package size (more S3 methods = more cache hits). All 1105 tests pass.
+
+**Verdict:** SUCCESS — committed. Small but clean improvement.

From c23ba732b5ebbc277e9c0599823b35b5f83ce872 Mon Sep 17 00:00:00 2001
From: Hadley Wickham <h.wickham@gmail.com>
Date: Fri, 20 Mar 2026 07:49:53 -0500
Subject: [PATCH 4/7] Use base strsplit() in find_generic() instead of
 stringr::str_split()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The stringr wrapper adds ~12µs overhead per call for a simple
fixed-character split. With 280 calls per run, this saves ~3.4ms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 R/object-s3.R           |  2 +-
 optimization-results.md | 17 +++++++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/R/object-s3.R b/R/object-s3.R
index 1ea589a7f..3ca1780a1 100644
--- a/R/object-s3.R
+++ b/R/object-s3.R
@@ -115,7 +115,7 @@ find_generic <- function(name, env = parent.frame()) {
     return(c("all.equal", method))
   }
 
-  pieces <- str_split(name, fixed("."))[[1]]
+  pieces <- strsplit(name, ".", fixed = TRUE)[[1]]
   n <- length(pieces)
 
   # No . in name, so can't be method
diff --git a/optimization-results.md b/optimization-results.md
index 8c479269c..8dd932ed0 100644
--- a/optimization-results.md
+++ b/optimization-results.md
@@ -54,3 +54,20 @@
 **Result:** ~17% faster for `is_s3_generic()` calls, saving ~3ms per `roxygenize()` run. Savings scale with package size (more S3 methods = more cache hits). All 1105 tests pass.
 
 **Verdict:** SUCCESS — committed. Small but clean improvement.
+
+## Experiment 4: Use base `strsplit()` instead of `stringr::str_split()` in `find_generic()`
+
+**Hypothesis:** `stringr::str_split()` has overhead from the stringr/stringi infrastructure. For a simple fixed-character split, base `strsplit()` is faster.
+
+**Change:** Replaced `str_split(name, fixed("."))[[1]]` with `strsplit(name, ".", fixed = TRUE)[[1]]` in `find_generic()`.
+
+**Microbenchmark (per call):**
+
+| Version | min | median |
+|---------|-----|--------|
+| str_split | 10.5µs | 12.6µs |
+| strsplit | 410ns | 533ns |
+
+**Result:** 24x faster per call, saving ~3.4ms per `roxygenize()` run (280 calls × 12µs). All 1105 tests pass.
+
+**Verdict:** SUCCESS — committed.

From e00dc5e545ae854fe31d9739be1fa80875d2bc88 Mon Sep 17 00:00:00 2001
From: Hadley Wickham <h.wickham@gmail.com>
Date: Fri, 20 Mar 2026 07:52:51 -0500
Subject: [PATCH 5/7] Skip link-reference regex when text has no brackets

Add fast grepl() pre-check in get_md_linkrefs() to avoid expensive
str_match_all() regex when text contains no [ character. Saves ~4.5ms
per roxygenize() run (205 of 240 calls skip the regex).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 R/markdown-link.R       |  3 +++
 optimization-results.md | 19 +++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/R/markdown-link.R b/R/markdown-link.R
index ebcafab01..f682ef186 100644
--- a/R/markdown-link.R
+++ b/R/markdown-link.R
@@ -64,6 +64,9 @@ add_linkrefs_to_md <- function(text) {
 }
 
 get_md_linkrefs <- function(text) {
+  if (!grepl("[", text, fixed = TRUE)) {
+    return(character())
+  }
   refs <- str_match_all(
     text,
     regex(
diff --git a/optimization-results.md b/optimization-results.md
index 8dd932ed0..bc620ed68 100644
--- a/optimization-results.md
+++ b/optimization-results.md
@@ -71,3 +71,22 @@
 **Result:** 24x faster per call, saving ~3.4ms per `roxygenize()` run (280 calls × 12µs). All 1105 tests pass.
 
 **Verdict:** SUCCESS — committed.
+
+## Experiment 5: Skip link-reference regex in `get_md_linkrefs()` when no `[` present
+
+**Hypothesis:** `get_md_linkrefs()` runs a complex regex via `str_match_all()` on every tag's text to find markdown link references like `[function()]`. Most tags don't contain `[` at all.
+
+**Data:** Of 240 calls, only 35 contain `[` (205 are plain text without brackets).
+
+**Change:** Added early return when `grepl("[", text, fixed = TRUE)` is FALSE.
+
+**Microbenchmark (per call, simple text):**
+
+| Version | min | median |
+|---------|-----|--------|
+| Before | 19.4µs | 23.3µs |
+| After | 779ns | 1.4µs |
+
+**Result:** 17x faster for plain text, saving ~4.5ms per `roxygenize()` run (205 × 22µs). All 1105 tests pass.
+
+**Verdict:** SUCCESS — committed.

From d1ced216cfec0b05b38405dcb243e82e3673be48 Mon Sep 17 00:00:00 2001
From: Hadley Wickham <h.wickham@gmail.com>
Date: Fri, 20 Mar 2026 07:56:08 -0500
Subject: [PATCH 6/7] Add overall benchmark results to optimization-results.md

End-to-end roxygenize() on roxygen2: 936ms -> 839ms (10.4% faster).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 optimization-results.md | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/optimization-results.md b/optimization-results.md
index bc620ed68..046f40268 100644
--- a/optimization-results.md
+++ b/optimization-results.md
@@ -1,5 +1,16 @@
 # Optimization results
 
+## Overall results
+
+End-to-end `roxygenize(load_code = "source")` on the roxygen2 package itself (10 iterations, `devtools::load_all()` in a clean R subprocess):
+
+| Version | min | median | Improvement |
+|---------|-----|--------|-------------|
+| Before (main) | 863ms | 936ms | — |
+| After (all experiments) | 805ms | 839ms | 10.4% faster |
+
+The optimizations are all fast-path short-circuits that avoid expensive work when it's not needed. They have the largest impact on packages where most roxygen tags contain simple text without inline R code, Rd macros, or markdown links (which is the common case).
+
 ## Experiment 1: Skip XML parsing in `markdown_evaluate()` for text without inline R code
 
 **Hypothesis:** `markdown_evaluate()` parses every tag's text to XML to check for inline R code (`\`r ...\``), but the vast majority of tags don't contain any. A fast `grepl()` pre-check can skip the expensive XML parsing.

From 20e0973e1d76743caabccee33b23d1a1ff94b955 Mon Sep 17 00:00:00 2001
From: Hadley Wickham <h.wickham@gmail.com>
Date: Fri, 20 Mar 2026 08:12:40 -0500
Subject: [PATCH 7/7] Update docs

---
 man/markdown-internals.Rd | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/man/markdown-internals.Rd b/man/markdown-internals.Rd
index 22e5234d5..9fd2f60b6 100644
--- a/man/markdown-internals.Rd
+++ b/man/markdown-internals.Rd
@@ -1,22 +1,22 @@
 % Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/markdown-escaping.R
-\name{escape_rd_for_md}
-\alias{escape_rd_for_md}
+\name{empty_rd_tags}
+\alias{empty_rd_tags}
 \alias{unescape_rd_for_md}
 \title{Escape fragile Rd tags}
 \usage{
-escape_rd_for_md(text)
+empty_rd_tags
 
 unescape_rd_for_md(rd_text, esc_text)
 }
 \arguments{
-\item{text}{Input text. Potentially contains Rd and/or
-markdown markup.}
-
 \item{rd_text}{The markdown parsed and interpreted text.}
 
 \item{esc_text}{The original escaped text from
 \code{escape_rd_for_md()}.}
+
+\item{text}{Input text. Potentially contains Rd and/or
+markdown markup.}
 }
 \value{
 \itemize{