feat: implement EOF token mechanism for command substitution parsing by ldayton · Pull Request #327 · ldayton/Parable

ldayton · 2026-01-17T12:07:13Z

Summary

Replace scanning-based command substitution parsing with bash's shell_eof_token approach
Parse $(...) content inline using the standard parser with an EOF token delimiter
Add _eof_token and _eof_depth fields to Parser, Lexer, and SavedParserState
Track paren depth in Lexer._read_operator() with PST_CASEPAT handling
Simplify _parse_command_substitution() to use parse_list() directly
Fix heredoc body gathering for delimiter followed by ) in command substitutions
Fix _is_word_boundary() to not treat }case as the case keyword
Apply same EOF token mechanism to _parse_process_substitution()
Add fallback for invalid process subs (e.g., <((a)b)) to return literal text

Net reduction of ~550 lines by eliminating redundant scanning code.

The EOF token mechanism failed because Parable's Parser is character-based, not token-based. Functions like parse_list(), parse_command() bypass the Lexer entirely, so setting _eof_token on the Lexer has no effect. Key insight: EOF token mechanism is a consequence of tokenizer-parser architecture, not something that can be bolted on. Updated all three plan docs with: - Status: BLOCKED until Parser is token-based - Root cause analysis - Correct order of work (make Parser token-based first) - Two paths forward (move methods first vs make Parser token-based first)

Merged 5 separate planning documents: - ARCHITECTURE_EVOLUTION.md (vision) - LEXER_PLAN.md (original implementation plan) - LEXER_REFACTOR_PLAN.md (word parsing approach) - EOF_TOKEN_PLAN.md (blocked) - PHASE6_PLAN.md (blocked) Single document now covers: - Architecture comparison (bash vs Parable) - Key insight from failed EOF token attempt - Completed work summary - Current state analysis - Two paths forward - Blocked work and prerequisites - Reference material

- Add token-based code example alongside character-based example - Clarify why leaf methods could move but others need sub-parsers - Make Path A step 3→4 dependency explicit ("step 3 is still the hard part") - Improve "Methods Still in Parser" table with clearer explanations - Rename "Clean Lexer Migration" to "Sub-Parser Elimination" - Fix misleading comment in code example

Replace scanning-based command substitution parsing with bash's shell_eof_token approach. This parses $(...) content inline using the standard parser with an EOF token delimiter. Key changes: - Add _eof_token and _eof_depth fields to Parser, Lexer, SavedParserState - Track paren depth in Lexer._read_operator() with PST_CASEPAT handling - Simplify _parse_command_substitution() to use parse_list() directly - Fix heredoc body gathering for delimiter followed by ) in cmdsubs - Fix _is_word_boundary() to not treat }case as the case keyword Net reduction of ~400 lines by eliminating redundant scanning code.

Refactor _parse_process_substitution() to use the same EOF token approach as command substitution. The lexer tracks paren depth and returns EOF when hitting ) at depth 0. Includes fallback handling: if parsing fails, scan to find the closing ) and return as literal text (for invalid process subs like <((a)b) which should be treated as literal characters). Net reduction of ~150 lines from process substitution parsing.

The EOF token mechanism works! Update documentation to reflect: - Command and process substitution now use EOF token inline parsing - ~550 lines of scanning code eliminated - Backtick substitution doesn't apply (escape handling requires scanning) - Remaining uses of scanning code documented

Includes fixes for JS backend: constructor defaults (#321), startswith pos arg (#324), operator precedence (#333), regex escaping (#322), template literal backticks (#323), destructuring discard (#326), isinstance primitives (#325, #327), backtick-heredoc (#352), and UTF-8 encoding (#334).

ldayton added 6 commits January 17, 2026 13:25

ldayton merged commit 456b549 into main Jan 17, 2026
1 check passed

ldayton mentioned this pull request Jan 17, 2026

refactor: use token-based terminator detection in parse_command #329

Merged

ldayton deleted the feat/eof-token-cmdsub branch January 17, 2026 12:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement EOF token mechanism for command substitution parsing#327

feat: implement EOF token mechanism for command substitution parsing#327
ldayton merged 6 commits intomainfrom
feat/eof-token-cmdsub

ldayton commented Jan 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ldayton commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ldayton commented Jan 17, 2026 •

edited

Loading