feat: implement EOF token mechanism for command substitution parsing#327
Merged
feat: implement EOF token mechanism for command substitution parsing#327
Conversation
The EOF token mechanism failed because Parable's Parser is character-based, not token-based. Functions like parse_list(), parse_command() bypass the Lexer entirely, so setting _eof_token on the Lexer has no effect. Key insight: EOF token mechanism is a consequence of tokenizer-parser architecture, not something that can be bolted on. Updated all three plan docs with: - Status: BLOCKED until Parser is token-based - Root cause analysis - Correct order of work (make Parser token-based first) - Two paths forward (move methods first vs make Parser token-based first)
Merged 5 separate planning documents: - ARCHITECTURE_EVOLUTION.md (vision) - LEXER_PLAN.md (original implementation plan) - LEXER_REFACTOR_PLAN.md (word parsing approach) - EOF_TOKEN_PLAN.md (blocked) - PHASE6_PLAN.md (blocked) Single document now covers: - Architecture comparison (bash vs Parable) - Key insight from failed EOF token attempt - Completed work summary - Current state analysis - Two paths forward - Blocked work and prerequisites - Reference material
- Add token-based code example alongside character-based example
- Clarify why leaf methods could move but others need sub-parsers
- Make Path A step 3→4 dependency explicit ("step 3 is still the hard part")
- Improve "Methods Still in Parser" table with clearer explanations
- Rename "Clean Lexer Migration" to "Sub-Parser Elimination"
- Fix misleading comment in code example
Replace scanning-based command substitution parsing with bash's shell_eof_token approach. This parses $(...) content inline using the standard parser with an EOF token delimiter. Key changes: - Add _eof_token and _eof_depth fields to Parser, Lexer, SavedParserState - Track paren depth in Lexer._read_operator() with PST_CASEPAT handling - Simplify _parse_command_substitution() to use parse_list() directly - Fix heredoc body gathering for delimiter followed by ) in cmdsubs - Fix _is_word_boundary() to not treat }case as the case keyword Net reduction of ~400 lines by eliminating redundant scanning code.
Refactor _parse_process_substitution() to use the same EOF token approach as command substitution. The lexer tracks paren depth and returns EOF when hitting ) at depth 0. Includes fallback handling: if parsing fails, scan to find the closing ) and return as literal text (for invalid process subs like <((a)b) which should be treated as literal characters). Net reduction of ~150 lines from process substitution parsing.
The EOF token mechanism works! Update documentation to reflect: - Command and process substitution now use EOF token inline parsing - ~550 lines of scanning code eliminated - Backtick substitution doesn't apply (escape handling requires scanning) - Remaining uses of scanning code documented
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
shell_eof_tokenapproach$(...)content inline using the standard parser with an EOF token delimiter_eof_tokenand_eof_depthfields to Parser, Lexer, and SavedParserStateLexer._read_operator()with PST_CASEPAT handling_parse_command_substitution()to useparse_list()directly)in command substitutions_is_word_boundary()to not treat}caseas thecasekeyword_parse_process_substitution()<((a)b)) to return literal textNet reduction of ~550 lines by eliminating redundant scanning code.