Single-pass compiler: Profile, benchmark, and optimize the new compilation pipeline

## Summary

After the new single-pass compiler is functional ([ENG-9142](https://linear.app/reflex-dev/issue/ENG-9142/single-pass-compiler-introduce-compilerplugin-protocol-compilerhooks), [ENG-9143](https://linear.app/reflex-dev/issue/ENG-9143/single-pass-compiler-implement-core-compilation-plugins-imports-hooks), [ENG-9144](https://linear.app/reflex-dev/issue/ENG-9144/single-pass-compiler-wire-plugin-pipeline-into-app-compile-and-remove)), profile it against the current compiler to identify and fix any performance regressions, then optimize low-hanging fruit in the hot paths.

## Prior Art: Existing CodSpeed Benchmark Suite

There is already a `pytest-codspeed` benchmark suite in `tests/benchmarks/` that covers parts of the compilation pipeline. Understanding what it already tests — and what it doesn't — is essential for scoping this work.

### What exists today

`tests/benchmarks/test_compilation.py` benchmarks three things:

* `test_compile_page` — times `_compile_page(evaluated_page)`, which is the final step of rendering an already-evaluated component tree to JS (calling `_get_all_imports`, `_get_all_dynamic_imports`, `_get_all_custom_code`, `_get_all_hooks`, and `component.render()`)
* `test_compile_stateful` — times `_compile_stateful_components([evaluated_page])`, which runs the `StatefulComponent` memoization and shared component extraction
* `test_get_all_imports` — times `evaluated_page._get_all_imports()` in isolation

`tests/benchmarks/test_evaluate.py` benchmarks:

* `test_evaluate_page` — times calling the page function itself (component tree construction)

`tests/benchmarks/fixtures.py` provides two representative page fixtures:

* `_complicated_page` — a sidebar-heavy layout with accordion navigation, \~50 link items across categories, using frozen dataclasses and `map()` for component generation. Exercises deeply nested component trees with many children.
* `_stateful_page` — a page with `rx.cond`, `rx.match`, `rx.foreach` (including nested foreach), state var references, and event handlers. Exercises the stateful component path.

Both fixtures are parametrized so each benchmark runs against both page types.

### What the existing suite does NOT cover

The existing benchmarks focus on **individual compilation functions in isolation** — they don't measure the full end-to-end pipeline as orchestrated by `App._compile()`. Specifically, these are not benchmarked today:

1. **Page evaluation + style application** — `compile_unevaluated_page()` calls `into_component()` then `_add_style_recursive()`. The `test_evaluate_page` benchmark only covers the first part.
2. **Full pipeline orchestration** — The overhead of `App._compile()` itself: iterating pages, executor setup, progress tracking, file writing, etc.
3. **Multi-page compilation** — How compile time scales with 5, 10, 20+ pages (especially with shared components across pages).
4. **Import merging/collapsing** — `merge_imports()` and `collapse_imports()` in `reflex/utils/imports.py` are called on every page's accumulated imports but aren't benchmarked independently.
5. **Plugin dispatch overhead** (new) — The async generator machinery and `CompilerHooks._dispatch()` cost per component, which is new to the single-pass architecture.

### Goal: extend the suite, don't replace it

The existing CodSpeed benchmarks should be **preserved and adapted** to benchmark the equivalent new-compiler functions. This gives us direct before/after comparisons on the same fixtures. New benchmarks should be **added** to cover the gaps listed above.

## Tasks

### 1\. Adapt existing benchmarks for the new compiler

The current benchmarks call `_compile_page()` and `_compile_stateful_components()` directly. After the new compiler lands:

* `test_compile_page` — Add a parallel benchmark that compiles the same `evaluated_page` through the new plugin pipeline (i.e., run the full `CompileContext.compile()` for a single page). Keep the old benchmark temporarily for A/B comparison.
* `test_compile_stateful` — This may no longer apply if `StatefulComponent` is replaced ([ENG-9145](https://linear.app/reflex-dev/issue/ENG-9145/single-pass-compiler-replace-statefulcomponent-auto-memoization-with)). Replace with a benchmark for the new `MemoizeStatefulPlugin` if applicable.
* `test_get_all_imports` — Add a parallel benchmark that runs `ConsolidateImportsPlugin` on the same page to compare single-pass import collection vs the recursive `_get_all_imports()` walk.

### 2\. Add new benchmarks for gaps in coverage

Add these to `tests/benchmarks/`:

* `test_compile_pipeline` — End-to-end: construct a `CompileContext` with both fixture pages plus additional pages sharing the `side_bar()` component, and time `compile_ctx.compile()`. This measures the full orchestrated pipeline including plugin dispatch.
* `test_plugin_dispatch_overhead` — Micro-benchmark: time `CompilerHooks._dispatch("compile_component", comp)` for a single component with the default plugin set. This isolates the async generator dispatch cost.
* `test_style_application` — Time `_add_style_recursive()` (or its plugin replacement `ApplyStylePlugin`) on the `_complicated_page` fixture. This is currently unmeasured.
* `test_import_merging` — Time `merge_imports()` and `collapse_imports()` on a realistic set of import dicts collected from the `_complicated_page` fixture.
* `test_multi_page_scaling` — Time compilation of 1, 5, 10, 20 pages (reusing the same page fixtures) to characterize scaling behavior.

### 3\. Profile the new compilation pipeline

Use `cProfile` / `py-spy` / `scalene` to identify hot paths beyond what CodSpeed measures. Known areas to watch:

* **Plugin dispatch overhead**: `CompilerHooks._dispatch()` creates generators for each plugin per component. For a tree with 1000 components and 8 plugins, that's 8000 generator creations. Measure if this is material.
* **Async generator overhead**: Each `compile_component` hook is an async generator with `yield` / `asend`. If the overhead per-component is even 10μs, it adds up at scale.
* `ContextVar.get()` calls: Plugins call `PageContext.get()` and `CompileContext.get()` frequently. `ContextVar` lookups are fast but not free.
* **Import merging**: `merge_imports()` and `collapse_imports()` in `reflex/utils/imports.py` are called frequently. Profile to see if they're a bottleneck.
* **Component rendering**: `component.render()` is likely the most expensive single operation per component.

### 4\. Optimize low-hanging fruit

Based on profiling, expected optimizations include:

* **Skip no-op plugins**: If a plugin's `compile_component` is the default (base protocol method), don't dispatch to it. The demo code already checks for this.
* **Batch plugin dispatch**: Instead of creating individual generators per plugin, consider a combined dispatch that reduces generator overhead.
* **Cache component renders**: If a component is immutable ([ENG-9146](https://linear.app/reflex-dev/issue/ENG-9146/single-pass-compiler-make-component-effectively-immutable-for-identity)), cache its `render()` output.
* **Reduce dict/set operations**: The consolidation plugins accumulate into dicts/sets on every component. Consider batched approaches or pre-allocated structures.
* **Minimize** `isinstance` checks: Several plugins check `isinstance(comp, Component)` — consider a pre-computed flag or tag.

### 5\. Compare against baseline

Run the CodSpeed suite (both old and new benchmarks) with the old compiler and the new compiler, and document:

* Per-benchmark time comparison
* Compilation time comparison on realistic apps
* Memory usage comparison
* Per-page time distribution

## Acceptance Criteria

- [ ] Existing CodSpeed benchmarks in `tests/benchmarks/test_compilation.py` have parallel versions for the new compiler pipeline
- [ ] New benchmarks cover: full pipeline, plugin dispatch overhead, style application, import merging, multi-page scaling
- [ ] All benchmarks use the existing fixtures from `tests/benchmarks/fixtures.py` (extend if needed, don't replace)
- [ ] Profiling results are documented (flame graph or equivalent)
- [ ] No performance regression > 10% on existing `test_compile_page` and `test_get_all_imports` benchmarks vs the old compiler
- [ ] At least one measurable optimization is implemented based on profiling
- [ ] Results are documented in the PR for future reference

## Key Files

* `tests/benchmarks/test_compilation.py` — existing CodSpeed benchmarks to extend
* `tests/benchmarks/test_evaluate.py` — existing page evaluation benchmark
* `tests/benchmarks/fixtures.py` — `_complicated_page`, `_stateful_page`, `SideBarState`, `BenchmarkState` fixtures
* `tests/benchmarks/conftest.py` — fixture wiring
* `reflex/compiler/compiler.py` — compilation functions (old and new)
* `reflex/compiler/plugins.py` (new) — plugin implementations
* `reflex/components/component.py` — `render()`, `_get_imports()`, etc.
* `reflex/utils/imports.py` — `merge_imports()`, `collapse_imports()`

## Notes

* CodSpeed runs in CI and provides automatic regression detection on PRs. The new benchmarks will inherit this behavior, giving us ongoing protection against compile-time regressions.
* The benchmark should eventually be extended with a "real app" fixture (e.g., a subset of the reflex-web docs site), but the existing synthetic fixtures are a good starting point.
* Performance of the compiler disproportionately affects developer experience (hot reload speed), so even small improvements matter.
* Keep in mind that the compiler is almost entirely CPU-bound Python. The biggest wins will come from doing less work (fewer traversals, more caching) rather than from parallelism.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single-pass compiler: Profile, benchmark, and optimize the new compilation pipeline #6215

Summary

Prior Art: Existing CodSpeed Benchmark Suite

What exists today

What the existing suite does NOT cover

Goal: extend the suite, don't replace it

Tasks

1. Adapt existing benchmarks for the new compiler

2. Add new benchmarks for gaps in coverage

3. Profile the new compilation pipeline

4. Optimize low-hanging fruit

5. Compare against baseline

Acceptance Criteria

Key Files

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Single-pass compiler: Profile, benchmark, and optimize the new compilation pipeline #6215

Description

Summary

Prior Art: Existing CodSpeed Benchmark Suite

What exists today

What the existing suite does NOT cover

Goal: extend the suite, don't replace it

Tasks

1. Adapt existing benchmarks for the new compiler

2. Add new benchmarks for gaps in coverage

3. Profile the new compilation pipeline

4. Optimize low-hanging fruit

5. Compare against baseline

Acceptance Criteria

Key Files

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions