refactor: move leaf expansion methods to Lexer#326
Merged
Conversation
Add infrastructure for saving and restoring full parser state when entering nested constructs like command substitutions. This follows bash's save_parser_state/restore_parser_state pattern. - Add SavedParserState class capturing parser_state, dolbrace_state, pending_heredocs, and context depth - Add _save_parser_state() and _restore_parser_state() methods - Refactor _parse_command_substitution to use the new infrastructure
Add infrastructure to support moving word parsing from Parser to Lexer: - Add `parts` field to Token class for expansion AST nodes - Add `_parser_state`, `_dolbrace_state`, `_pending_heredocs` to Lexer - Add LexerSavedState class for save/restore during nested parsing - Add Lexer._save_state(), _restore_state(), _set/clear/has_parser_state() - Add LEXER_REFACTOR_PLAN.md documenting the full migration plan Part of Approach A: Move all word parsing to Lexer.
- Add Lexer._read_ansi_c_quote() method - Parser._parse_ansi_c_quote() now delegates to Lexer - Update transpiler to pre-populate class_names with Node subclasses that Lexer will instantiate (fixes forward reference issue) First expansion method moved as part of Phase 3.
Add Lexer._parser field so Lexer can call back to Parser for expansion parsing. This enables moving word parsing to Lexer while reusing existing Parser expansion methods.
Moved the following methods from Parser to Lexer: - _parse_ansi_c_quote -> Lexer._read_ansi_c_quote - _parse_locale_string -> Lexer._read_locale_string - _parse_param_expansion -> Lexer._read_param_expansion (including helpers: _consume_param_name, _consume_param_operator, _param_subscript_has_close, _read_braced_param) Added Lexer sync helpers: - _sync_to_parser: sync Parser.pos to Lexer.pos before callbacks - _sync_from_parser: sync Lexer.pos from Parser.pos after callbacks - _update_dolbrace_for_op: update dolbrace state based on operator Parser methods now delegate to Lexer for these expansion types. Removed ~650 lines of redundant Parser code. Methods that create sub-parsers (_parse_command_substitution, etc.) remain in Parser as they need to create Parser instances for nested command content.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Move self-contained ("leaf") expansion parsing methods from Parser to Lexer, matching bash's architecture where the lexer handles word-level parsing.
Moved to Lexer
_read_ansi_c_quote- parses$'...'ANSI-C quoted strings_read_locale_string- parses$"..."locale strings_read_param_expansion- parses$var,${var}, and complex${var%pattern}expansions_consume_param_name,_consume_param_operator,_param_subscript_has_close,_read_braced_param,_update_dolbrace_for_opInfrastructure Added
SavedParserStateclass for nested parsing contextLexerSavedStateclass for lexer state preservationLexer._parserback-reference for callbacksLexer._sync_to_parser()/_sync_from_parser()for position synchronizationParser Changes
_parse_ansi_c_quote,_parse_locale_string,_parse_param_expansionnow delegate to LexerNot Moved (Deferred)
Methods that create sub-parsers remain in Parser:
_parse_command_substitution- createsParser(content)for nested commands_parse_backtick_substitution- creates sub-parser_parse_process_substitution- creates sub-parser_parse_arithmetic_expansion- recursiveparse_list()interactionsThese require an "EOF token" mechanism (like bash's
shell_eof_token) to move properly - planned for follow-up PR.