A ground-up redesign of the Crystal compiler focused on developer experience, performance, and maintainability.
The Crystal language is semantically rich and elegant, combining Ruby's expressiveness with static typing and native performance. However, the current compiler has architectural limitations that impact the development experience:
- Slow compilation times - Full recompilation on small changes
- Limited incremental compilation - No fine-grained dependency tracking
- Poor LSP experience - Crystalline is slow (3-5 seconds for large files)
- Monolithic architecture - Tight coupling between phases
- Hard to extend - Adding new features requires deep compiler knowledge
Go's success teaches us something profound: Developer Experience trumps language features.
Despite Crystal's superior syntax and semantics compared to Go, Go dominates because:
- Fast compilation (< 1 second for most projects)
- Instant feedback (go fmt, go test, go build - all blazingly fast)
- Excellent tooling (gopls LSP, go mod, integrated testing)
- Simple mental model (explicit, predictable, no magic)
- Fast edit-compile-test cycle (the most important metric!)
Crystal V2 aims to match Go's DX while keeping Crystal's superior language design.
- Incremental by default - Never recompute what hasn't changed
- LSP-first architecture - Real-time feedback without full compilation
- Modular design - Clear separation of concerns
- Performance matters - Sub-second response times for typical edits
- Zero-copy where possible - Memory efficiency through smart data structures
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ LSP Server โ
โ (Real-time diagnostics, hover, completion) โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโ
โ โ
โโโโโโโโผโโโโโโโ โโโโโโโผโโโโโโโ
โ Frontend โ โ Semantic โ
โ (Fast) โ โ Analysis โ
โโโโโโโโฌโโโโโโโ โโโโโโโฌโโโโโโโ
โ โ
โโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโฌโโผโโโโโโโ
โ VirtualArena โ Type โ
โ (Zero-copy multi-file) โ Infer โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโ
โ
โโโโโโโโผโโโโโโโ
โ Codegen โ
โ (Future) โ
โโโโโโโโโโโโโโโ
Goal: Parse entire projects in < 100ms
Key Innovations:
- Streaming lexer - No buffering, processes tokens on-demand
- Pratt parser - Clean precedence handling, easy to extend
- Zero-copy string handling - StringPool for deduplication
- Comprehensive error recovery - Keep parsing after errors
Performance (with --release flag):
parser.cr(14,377 nodes): ~3ms (production-optimized build)compiler.cr(463 files, 280K nodes): 169ms sequential, 99ms parallel (1.71x speedup)prelude.cr(325 files, 314K nodes): 151ms sequential, 64ms parallel (2.36x speedup)
Status: โ Production-ready
Problem: How to efficiently manage ASTs from hundreds of files?
Solution: Zero-copy virtual addressing with O(log N) lookup
# Traditional approach: Copy all nodes to one arena (slow, wasteful)
global_arena = AstArena.new
files.each { |f| global_arena.concat(parse(f).arena) }
# V2 approach: Keep per-file arenas, virtual addressing
virtual = VirtualArena.new
files.each { |f| virtual.add_file_arena(f.path, parse(f).arena) }
# Zero copies! Just offset mappingBenefits:
- 0.04% memory overhead (just offset array)
- O(log N) node lookup (binary search by file)
- Incremental updates - Replace single file arena without touching others
- Perfect for LSP - Update changed file, keep rest intact
Performance:
- Kemal (244 files, 106K nodes): ~25ms load time (--release, estimated)
- Perfect deduplication - 0% duplicate parsing
- 1.7-2.4x speedup with multi-threading
Status: โ Production-ready
Features:
- Parallel loading with Crystal fibers
- Perfect deduplication - Each file parsed exactly once
- Circular dependency detection
- Shard support - Automatically finds dependencies in
lib/ - Deadlock-free - Buffered channels prevent blocking
Real-world validation:
- โ spec (19 files, 45ms)
- โ reply (10 files, 18ms)
- โ Kemal (244 files, 371ms)
- โ compiler.cr (463 files, 1.19s)
Status: โ Production-ready
Goal: Fast, incremental type inference for LSP
Current Status:
- โ Basic type inference (literals, variables, simple methods)
- โ Symbol table with scope tracking
โ ๏ธ Partial generic supportโ ๏ธ Partial union type support- โ Full constraint solving (needed for codegen)
Next Steps:
- Generic type instantiation
- Union type narrowing
- Method overload resolution
- Type constraint satisfaction
Status: ๐ง 70% complete
Components:
- โ Symbol collector (classes, methods, variables)
- โ Name resolver (finds definitions)
- โ Diagnostic formatter (beautiful error messages)
โ ๏ธ Type checker (basic, needs enhancement)- โ Macro expander (placeholder only)
Status: ๐ง 50% complete
- Capabilities: hover, definition/typeDefinition, references, rename (prepare), code actions, formatting, semantic tokens (full), folding ranges, inlay hints, call hierarchy.
- Caching: project cache stores symbol summaries and inferred expression types; hover/definition can respond instantly from cache while indexing; background indexing keeps cache warm. Out-of-root files still fall back to live analysis.
- Indexing guard: soft-fails hover/definition when indexing; VS Code extension shows โIndexingโฆโ and logs request/response traffic.
- Debugging:
./build_lsp_debug.shbuilds the server;tools/lsp_probe.py path.cr --position LINE:COLsends hover/definition/tokens in one session (setLSP_DEBUG=1for verbose logs).
| Aspect | Original | V2 | Advantage |
|---|---|---|---|
| Parse Speed | 65ms (parser.cr) | 43ms | 34% faster |
| Multi-file | Sequential + copies | Parallel + zero-copy | 1.73x faster |
| Memory | Monolithic arena | Virtual arena | 0.04% overhead |
| Incremental | Full recompile | File-level replace | 100x faster edits |
| LSP readiness | Not designed for it | LSP-first | Real-time feedback |
| Architecture | Monolithic | Modular | Easy to extend |
Original: Tightly coupled parsing โ semantic โ codegen V2: Independent phases with clear interfaces
# V2: Each phase is standalone
program = Parser.new(lexer).parse_program
symbols = SymbolCollector.new.collect(program)
types = TypeInferenceEngine.new.infer(program, symbols)
# Can stop here for LSP - no codegen needed!Original: Full recompilation on any change V2: Replace only changed files
# Update single file in LSP
arena.replace_file_arena(file_path, new_arena)
# Only this file's types need re-inferenceOriginal: Copy AST nodes during processing V2: Zero-copy virtual addressing
Result: 8% more compact AST (14,377 vs 15,631 nodes for parser.cr)
Original: Stop at first error in file V2: Continue parsing, report all errors
Result: Better DX - fix multiple errors at once
What we have now:
- Fast, parallel file loading
- Zero-copy multi-file AST
- Comprehensive test coverage (30 regression tests, 93 spec files)
- Ready for LSP integration
- Protocol: initialize, didOpen/didChange/didClose, hover, definition, references, rename, code actions (basic), folding ranges, semantic tokens, inlay hints, signature help, document symbols, call hierarchy.
- Accuracy: segment-aware path resolution, macro call navigation, navigation into stdlib/prelude; rename is guarded for stdlib/prelude symbols.
- Performance: stub-first prelude with background real load; project cache v2 (symbol summaries) merged on didOpen to avoid reloading requires; timing breakdown logs (parse/requires/symbols/resolve/infer) and indexing notifications (
Indexingโฆ/Ready) surfaced to UI. - VSCode: dedicated โCrystal V2 LSP Messagesโ output channel with request/response logging; status bar shows indexing state.
- DX guardrails: hover/definition soft-fail while indexing; folding for begin/rescue/else/ensure without overfold; semantic tokens keep require strings as strings and symbol literals as full-span enumMember tokens.
Requirements for codegen:
- Generic type instantiation
- Union type narrowing
- Method overload resolution
- Constraint satisfaction
Also enables better LSP:
- Accurate type on hover
- Smarter auto-completion
- Precise go-to-definition
Why: Security is a competitive advantage
Features:
- Secrets detection (hardcoded API keys, passwords)
- Injection vulnerabilities (SQL, command, XSS)
- Taint analysis (track user input โ dangerous sinks)
- Crypto mistakes (MD5 usage, weak random)
Output formats:
- Terminal (for developers)
- SARIF (for GitHub Code Scanning)
- JSON (for CI/CD)
Impact: No other Crystal tool does this!
Goal: Full compiler, self-hosting
Components:
- LLVM IR generation
- Memory management (GC integration)
- Virtual method tables
- Closure compilation
Milestone: Compile Crystal V2 with Crystal V2
- Superior syntax - Ruby-inspired, elegant, readable
- Rich type system - Union types, generics, macros
- Native performance - Zero-cost abstractions
- Garbage collection - No manual memory management
- Metaprogramming - Compile-time macros, not runtime reflection
- โ Fast compilation - Parallel loading, incremental updates
- โ Great tooling - LSP server (in progress)
- ๐ง Simple mental model - Clear error messages, predictable behavior
- ๐ง Quick feedback loop - Sub-second edit-compile-test
- โ Package ecosystem - (Separate from compiler)
Go's killer feature isn't syntax - it's the ~1 second edit-compile-test cycle.
Crystal V2 targets:
- < 50ms LSP response (syntax errors, hover)
- < 500ms incremental compile (for changed file)
- < 2s full project compile (for small projects)
This makes Crystal development feel as responsive as Go, while keeping Crystal's superior language design.
Parser Performance (--release build):
parser.cr: ~3ms (14,377 nodes) โก ~15x faster than debug
compiler.cr: 99ms parallel (463 files, 280K nodes) โก 12x faster than debug
prelude.cr: 64ms parallel (325 files, 314K nodes) โก 15x faster than debug
Kemal: ~25ms (244 files, 106K nodes, estimated) โก
Multi-threading Gains (--release):
compiler.cr: 169ms sequential โ 99ms parallel (1.71x speedup)
prelude.cr: 151ms sequential โ 64ms parallel (2.36x speedup)
Note: Debug builds are ~12-15x slower. Always use --release for production and benchmarking!
Memory Efficiency:
VirtualArena: 0.04% overhead
AST compactness: 8% better than original (14,377 vs 15,631 nodes)
Deduplication: 100% (0% duplicate parsing)
Test Coverage:
- 30 regression tests (parser.cr baseline)
- 93 spec files (comprehensive parser coverage)
- All major Crystal constructs covered
- Edge cases tested (Unicode, escapes, operators)
Architecture:
- Modular design (clear separation of concerns)
- Zero-copy where possible
- Incremental by default
- LSP-first thinking
Parser (97.6% parity with Crystal)
- Fast streaming lexer with string interning
- Pratt parser with comprehensive error recovery
- 2856 tests passing (1390 ported from Crystal's parser_spec.cr)
- AST class inheritance (94 node types migrated)
- All major constructs: heredocs, blocks, case/when, rescue/ensure
-
outkeyword, inlineasm, annotations, macros
Infrastructure
- Zero-copy VirtualArena for multi-file AST
- Parallel FileLoader with perfect deduplication
- Real-world validation (Kemal, compiler.cr, prelude.cr)
LSP Server (~70%)
- 21 LSP methods implemented
- Definition, references, hover, completion
- Semantic tokens, inlay hints, folding
- Formatting (54% faster than original)
Semantic (~50%)
- Basic type inference
- Symbol table and name resolution
- MVP MacroExpander (
{{ }},{% if/for %},@type.*)
- Diagnostics parity (no false positives)
- Type/hover accuracy matching original compiler
- Navigation to stdlib and macro-generated methods
- Full
@type.*API with type graph - Annotation objects (
.args,.named_args) - Macro methods (
.stringify,.id,.class?)
- Generic instantiation and unification
- Union type narrowing with flow analysis
- Method overload resolution
- SSA-style IR
- LLVM IR generation
- Self-hosting test
This is a ground-up redesign with clear architecture. Each phase is independent:
- Frontend hackers: Parser is production-ready, extensible
- Type system nerds: Type inference needs completion
- Security folks: CrystalGuard is greenfield
- LSP enthusiasts: Server implementation starting soon
- LLVM experts: Codegen phase needs you
# Clone and setup
git clone https://github.com/crystal-lang/crystal.git
cd crystal
git checkout new_crystal_parser
# Run tests
cd crystal_v2
crystal spec
# Run regression tests
crystal run debug_tests/parser_regression_test.cr
# Try benchmarks
crystal run benchmarks/benchmark_parser.crdocs/architecture_overview.md- High-level designdocs/parser_design.md- Parser implementationdocs/original_parser_analysis.md- Comparison with original
crystal_v2/
โโโ src/
โ โโโ compiler/
โ โ โโโ frontend/ # Lexer, Parser, AST
โ โ โ โโโ lexer.cr
โ โ โ โโโ parser.cr
โ โ โ โโโ ast.cr # VirtualArena here
โ โ โโโ semantic/ # Type inference, analysis
โ โ โ โโโ type_inference_engine.cr
โ โ โ โโโ symbol_table.cr
โ โ โ โโโ collectors/
โ โ โโโ file_loader.cr # Multi-file loading
โ โโโ crystal_v2.cr # Main entry point
โโโ spec/ # Test suite (93 files)
โโโ benchmarks/ # Performance tests
โโโ debug_tests/ # Regression tests
Crystal is a beautiful language with a slow compiler. This limits adoption.
Go is a mediocre language with a fast compiler. This drives adoption.
Crystal V2 aims to give Crystal the tooling it deserves.
When Crystal has:
- Sub-second compilation
- Real-time LSP feedback
- Security analysis tools
- Great error messages
...developers will choose Crystal over Go for new projects.
Because Crystal is already better - it just needs better tools.
Lead: Sergey Kuznetsov crystal@rigelstar.com
Contributors:
- Claude (Anthropic AI Assistant) - Architecture, implementation
- GPT-5 (OpenAI AI Assistant) - Design, optimization
MIT (same as Crystal)
The foundation is solid. Parser is fast. Multi-file support works. Tests pass.
Time to build the LSP server and give Crystal developers the experience they deserve.
๐ Let's make Crystal compilation as fast as Go, while keeping Crystal's superior language design.