Skip to content

refactor: move leaf expansion methods to Lexer#326

Merged
ldayton merged 8 commits intomainfrom
phase6-unify-scanning-parsing
Jan 17, 2026
Merged

refactor: move leaf expansion methods to Lexer#326
ldayton merged 8 commits intomainfrom
phase6-unify-scanning-parsing

Conversation

@ldayton
Copy link
Copy Markdown
Owner

@ldayton ldayton commented Jan 17, 2026

Summary

Move self-contained ("leaf") expansion parsing methods from Parser to Lexer, matching bash's architecture where the lexer handles word-level parsing.

Moved to Lexer

  • _read_ansi_c_quote - parses $'...' ANSI-C quoted strings
  • _read_locale_string - parses $"..." locale strings
  • _read_param_expansion - parses $var, ${var}, and complex ${var%pattern} expansions
    • Including helpers: _consume_param_name, _consume_param_operator, _param_subscript_has_close, _read_braced_param, _update_dolbrace_for_op

Infrastructure Added

  • SavedParserState class for nested parsing context
  • LexerSavedState class for lexer state preservation
  • Lexer._parser back-reference for callbacks
  • Lexer._sync_to_parser() / _sync_from_parser() for position synchronization
  • Transpiler updated to handle forward references to Node subclasses

Parser Changes

  • _parse_ansi_c_quote, _parse_locale_string, _parse_param_expansion now delegate to Lexer
  • Removed ~650 lines of redundant Parser code

Not Moved (Deferred)

Methods that create sub-parsers remain in Parser:

  • _parse_command_substitution - creates Parser(content) for nested commands
  • _parse_backtick_substitution - creates sub-parser
  • _parse_process_substitution - creates sub-parser
  • _parse_arithmetic_expansion - recursive parse_list() interactions

These require an "EOF token" mechanism (like bash's shell_eof_token) to move properly - planned for follow-up PR.

Add infrastructure for saving and restoring full parser state when
entering nested constructs like command substitutions. This follows
bash's save_parser_state/restore_parser_state pattern.

- Add SavedParserState class capturing parser_state, dolbrace_state,
  pending_heredocs, and context depth
- Add _save_parser_state() and _restore_parser_state() methods
- Refactor _parse_command_substitution to use the new infrastructure
Add infrastructure to support moving word parsing from Parser to Lexer:

- Add `parts` field to Token class for expansion AST nodes
- Add `_parser_state`, `_dolbrace_state`, `_pending_heredocs` to Lexer
- Add LexerSavedState class for save/restore during nested parsing
- Add Lexer._save_state(), _restore_state(), _set/clear/has_parser_state()
- Add LEXER_REFACTOR_PLAN.md documenting the full migration plan

Part of Approach A: Move all word parsing to Lexer.
- Add Lexer._read_ansi_c_quote() method
- Parser._parse_ansi_c_quote() now delegates to Lexer
- Update transpiler to pre-populate class_names with Node subclasses
  that Lexer will instantiate (fixes forward reference issue)

First expansion method moved as part of Phase 3.
Add Lexer._parser field so Lexer can call back to Parser for
expansion parsing. This enables moving word parsing to Lexer
while reusing existing Parser expansion methods.
Moved the following methods from Parser to Lexer:
- _parse_ansi_c_quote -> Lexer._read_ansi_c_quote
- _parse_locale_string -> Lexer._read_locale_string
- _parse_param_expansion -> Lexer._read_param_expansion
  (including helpers: _consume_param_name, _consume_param_operator,
   _param_subscript_has_close, _read_braced_param)

Added Lexer sync helpers:
- _sync_to_parser: sync Parser.pos to Lexer.pos before callbacks
- _sync_from_parser: sync Lexer.pos from Parser.pos after callbacks
- _update_dolbrace_for_op: update dolbrace state based on operator

Parser methods now delegate to Lexer for these expansion types.
Removed ~650 lines of redundant Parser code.

Methods that create sub-parsers (_parse_command_substitution, etc.)
remain in Parser as they need to create Parser instances for nested
command content.
@ldayton ldayton changed the title refactor: add SavedParserState for nested parsing refactor: move leaf expansion methods to Lexer Jan 17, 2026
@ldayton ldayton merged commit 8af32ee into main Jan 17, 2026
1 check passed
@ldayton ldayton deleted the phase6-unify-scanning-parsing branch January 17, 2026 10:55
ldayton added a commit that referenced this pull request Mar 25, 2026
Includes fixes for JS backend: constructor defaults (#321), startswith
pos arg (#324), operator precedence (#333), regex escaping (#322),
template literal backticks (#323), destructuring discard (#326),
isinstance primitives (#325, #327), backtick-heredoc (#352), and
UTF-8 encoding (#334).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant