Skip to content

Single-pass compiler: Profile, benchmark, and optimize the new compilation pipeline #6215

@masenf

Description

@masenf

Summary

After the new single-pass compiler is functional (ENG-9142, ENG-9143, ENG-9144), profile it against the current compiler to identify and fix any performance regressions, then optimize low-hanging fruit in the hot paths.

Prior Art: Existing CodSpeed Benchmark Suite

There is already a pytest-codspeed benchmark suite in tests/benchmarks/ that covers parts of the compilation pipeline. Understanding what it already tests — and what it doesn't — is essential for scoping this work.

What exists today

tests/benchmarks/test_compilation.py benchmarks three things:

  • test_compile_page — times _compile_page(evaluated_page), which is the final step of rendering an already-evaluated component tree to JS (calling _get_all_imports, _get_all_dynamic_imports, _get_all_custom_code, _get_all_hooks, and component.render())
  • test_compile_stateful — times _compile_stateful_components([evaluated_page]), which runs the StatefulComponent memoization and shared component extraction
  • test_get_all_imports — times evaluated_page._get_all_imports() in isolation

tests/benchmarks/test_evaluate.py benchmarks:

  • test_evaluate_page — times calling the page function itself (component tree construction)

tests/benchmarks/fixtures.py provides two representative page fixtures:

  • _complicated_page — a sidebar-heavy layout with accordion navigation, ~50 link items across categories, using frozen dataclasses and map() for component generation. Exercises deeply nested component trees with many children.
  • _stateful_page — a page with rx.cond, rx.match, rx.foreach (including nested foreach), state var references, and event handlers. Exercises the stateful component path.

Both fixtures are parametrized so each benchmark runs against both page types.

What the existing suite does NOT cover

The existing benchmarks focus on individual compilation functions in isolation — they don't measure the full end-to-end pipeline as orchestrated by App._compile(). Specifically, these are not benchmarked today:

  1. Page evaluation + style applicationcompile_unevaluated_page() calls into_component() then _add_style_recursive(). The test_evaluate_page benchmark only covers the first part.
  2. Full pipeline orchestration — The overhead of App._compile() itself: iterating pages, executor setup, progress tracking, file writing, etc.
  3. Multi-page compilation — How compile time scales with 5, 10, 20+ pages (especially with shared components across pages).
  4. Import merging/collapsingmerge_imports() and collapse_imports() in reflex/utils/imports.py are called on every page's accumulated imports but aren't benchmarked independently.
  5. Plugin dispatch overhead (new) — The async generator machinery and CompilerHooks._dispatch() cost per component, which is new to the single-pass architecture.

Goal: extend the suite, don't replace it

The existing CodSpeed benchmarks should be preserved and adapted to benchmark the equivalent new-compiler functions. This gives us direct before/after comparisons on the same fixtures. New benchmarks should be added to cover the gaps listed above.

Tasks

1. Adapt existing benchmarks for the new compiler

The current benchmarks call _compile_page() and _compile_stateful_components() directly. After the new compiler lands:

  • test_compile_page — Add a parallel benchmark that compiles the same evaluated_page through the new plugin pipeline (i.e., run the full CompileContext.compile() for a single page). Keep the old benchmark temporarily for A/B comparison.
  • test_compile_stateful — This may no longer apply if StatefulComponent is replaced (ENG-9145). Replace with a benchmark for the new MemoizeStatefulPlugin if applicable.
  • test_get_all_imports — Add a parallel benchmark that runs ConsolidateImportsPlugin on the same page to compare single-pass import collection vs the recursive _get_all_imports() walk.

2. Add new benchmarks for gaps in coverage

Add these to tests/benchmarks/:

  • test_compile_pipeline — End-to-end: construct a CompileContext with both fixture pages plus additional pages sharing the side_bar() component, and time compile_ctx.compile(). This measures the full orchestrated pipeline including plugin dispatch.
  • test_plugin_dispatch_overhead — Micro-benchmark: time CompilerHooks._dispatch("compile_component", comp) for a single component with the default plugin set. This isolates the async generator dispatch cost.
  • test_style_application — Time _add_style_recursive() (or its plugin replacement ApplyStylePlugin) on the _complicated_page fixture. This is currently unmeasured.
  • test_import_merging — Time merge_imports() and collapse_imports() on a realistic set of import dicts collected from the _complicated_page fixture.
  • test_multi_page_scaling — Time compilation of 1, 5, 10, 20 pages (reusing the same page fixtures) to characterize scaling behavior.

3. Profile the new compilation pipeline

Use cProfile / py-spy / scalene to identify hot paths beyond what CodSpeed measures. Known areas to watch:

  • Plugin dispatch overhead: CompilerHooks._dispatch() creates generators for each plugin per component. For a tree with 1000 components and 8 plugins, that's 8000 generator creations. Measure if this is material.
  • Async generator overhead: Each compile_component hook is an async generator with yield / asend. If the overhead per-component is even 10μs, it adds up at scale.
  • ContextVar.get() calls: Plugins call PageContext.get() and CompileContext.get() frequently. ContextVar lookups are fast but not free.
  • Import merging: merge_imports() and collapse_imports() in reflex/utils/imports.py are called frequently. Profile to see if they're a bottleneck.
  • Component rendering: component.render() is likely the most expensive single operation per component.

4. Optimize low-hanging fruit

Based on profiling, expected optimizations include:

  • Skip no-op plugins: If a plugin's compile_component is the default (base protocol method), don't dispatch to it. The demo code already checks for this.
  • Batch plugin dispatch: Instead of creating individual generators per plugin, consider a combined dispatch that reduces generator overhead.
  • Cache component renders: If a component is immutable (ENG-9146), cache its render() output.
  • Reduce dict/set operations: The consolidation plugins accumulate into dicts/sets on every component. Consider batched approaches or pre-allocated structures.
  • Minimize isinstance checks: Several plugins check isinstance(comp, Component) — consider a pre-computed flag or tag.

5. Compare against baseline

Run the CodSpeed suite (both old and new benchmarks) with the old compiler and the new compiler, and document:

  • Per-benchmark time comparison
  • Compilation time comparison on realistic apps
  • Memory usage comparison
  • Per-page time distribution

Acceptance Criteria

  • Existing CodSpeed benchmarks in tests/benchmarks/test_compilation.py have parallel versions for the new compiler pipeline
  • New benchmarks cover: full pipeline, plugin dispatch overhead, style application, import merging, multi-page scaling
  • All benchmarks use the existing fixtures from tests/benchmarks/fixtures.py (extend if needed, don't replace)
  • Profiling results are documented (flame graph or equivalent)
  • No performance regression > 10% on existing test_compile_page and test_get_all_imports benchmarks vs the old compiler
  • At least one measurable optimization is implemented based on profiling
  • Results are documented in the PR for future reference

Key Files

  • tests/benchmarks/test_compilation.py — existing CodSpeed benchmarks to extend
  • tests/benchmarks/test_evaluate.py — existing page evaluation benchmark
  • tests/benchmarks/fixtures.py_complicated_page, _stateful_page, SideBarState, BenchmarkState fixtures
  • tests/benchmarks/conftest.py — fixture wiring
  • reflex/compiler/compiler.py — compilation functions (old and new)
  • reflex/compiler/plugins.py (new) — plugin implementations
  • reflex/components/component.pyrender(), _get_imports(), etc.
  • reflex/utils/imports.pymerge_imports(), collapse_imports()

Notes

  • CodSpeed runs in CI and provides automatic regression detection on PRs. The new benchmarks will inherit this behavior, giving us ongoing protection against compile-time regressions.
  • The benchmark should eventually be extended with a "real app" fixture (e.g., a subset of the reflex-web docs site), but the existing synthetic fixtures are a good starting point.
  • Performance of the compiler disproportionately affects developer experience (hot reload speed), so even small improvements matter.
  • Keep in mind that the compiler is almost entirely CPU-bound Python. The biggest wins will come from doing less work (fewer traversals, more caching) rather than from parallelism.

Metadata

Metadata

Assignees

Labels

enhancementAnything you want improved

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions