Skip to content

feat: implement EOF token mechanism for command substitution parsing#327

Merged
ldayton merged 6 commits intomainfrom
feat/eof-token-cmdsub
Jan 17, 2026
Merged

feat: implement EOF token mechanism for command substitution parsing#327
ldayton merged 6 commits intomainfrom
feat/eof-token-cmdsub

Conversation

@ldayton
Copy link
Copy Markdown
Owner

@ldayton ldayton commented Jan 17, 2026

Summary

  • Replace scanning-based command substitution parsing with bash's shell_eof_token approach
  • Parse $(...) content inline using the standard parser with an EOF token delimiter
  • Add _eof_token and _eof_depth fields to Parser, Lexer, and SavedParserState
  • Track paren depth in Lexer._read_operator() with PST_CASEPAT handling
  • Simplify _parse_command_substitution() to use parse_list() directly
  • Fix heredoc body gathering for delimiter followed by ) in command substitutions
  • Fix _is_word_boundary() to not treat }case as the case keyword
  • Apply same EOF token mechanism to _parse_process_substitution()
  • Add fallback for invalid process subs (e.g., <((a)b)) to return literal text

Net reduction of ~550 lines by eliminating redundant scanning code.

The EOF token mechanism failed because Parable's Parser is character-based,
not token-based. Functions like parse_list(), parse_command() bypass the
Lexer entirely, so setting _eof_token on the Lexer has no effect.

Key insight: EOF token mechanism is a consequence of tokenizer-parser
architecture, not something that can be bolted on.

Updated all three plan docs with:
- Status: BLOCKED until Parser is token-based
- Root cause analysis
- Correct order of work (make Parser token-based first)
- Two paths forward (move methods first vs make Parser token-based first)
Merged 5 separate planning documents:
- ARCHITECTURE_EVOLUTION.md (vision)
- LEXER_PLAN.md (original implementation plan)
- LEXER_REFACTOR_PLAN.md (word parsing approach)
- EOF_TOKEN_PLAN.md (blocked)
- PHASE6_PLAN.md (blocked)

Single document now covers:
- Architecture comparison (bash vs Parable)
- Key insight from failed EOF token attempt
- Completed work summary
- Current state analysis
- Two paths forward
- Blocked work and prerequisites
- Reference material
- Add token-based code example alongside character-based example
- Clarify why leaf methods could move but others need sub-parsers
- Make Path A step 3→4 dependency explicit ("step 3 is still the hard part")
- Improve "Methods Still in Parser" table with clearer explanations
- Rename "Clean Lexer Migration" to "Sub-Parser Elimination"
- Fix misleading comment in code example
Replace scanning-based command substitution parsing with bash's
shell_eof_token approach. This parses $(...) content inline using
the standard parser with an EOF token delimiter.

Key changes:
- Add _eof_token and _eof_depth fields to Parser, Lexer, SavedParserState
- Track paren depth in Lexer._read_operator() with PST_CASEPAT handling
- Simplify _parse_command_substitution() to use parse_list() directly
- Fix heredoc body gathering for delimiter followed by ) in cmdsubs
- Fix _is_word_boundary() to not treat }case as the case keyword

Net reduction of ~400 lines by eliminating redundant scanning code.
Refactor _parse_process_substitution() to use the same EOF token
approach as command substitution. The lexer tracks paren depth and
returns EOF when hitting ) at depth 0.

Includes fallback handling: if parsing fails, scan to find the
closing ) and return as literal text (for invalid process subs
like <((a)b) which should be treated as literal characters).

Net reduction of ~150 lines from process substitution parsing.
The EOF token mechanism works! Update documentation to reflect:
- Command and process substitution now use EOF token inline parsing
- ~550 lines of scanning code eliminated
- Backtick substitution doesn't apply (escape handling requires scanning)
- Remaining uses of scanning code documented
@ldayton ldayton merged commit 456b549 into main Jan 17, 2026
1 check passed
@ldayton ldayton deleted the feat/eof-token-cmdsub branch January 17, 2026 12:57
ldayton added a commit that referenced this pull request Mar 25, 2026
Includes fixes for JS backend: constructor defaults (#321), startswith
pos arg (#324), operator precedence (#333), regex escaping (#322),
template literal backticks (#323), destructuring discard (#326),
isinstance primitives (#325, #327), backtick-heredoc (#352), and
UTF-8 encoding (#334).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant