Skip to content

refactor: introduce QuoteState and ContextStack for unified parsing state#321

Merged
ldayton merged 6 commits intomainfrom
context-stack-refactor
Jan 16, 2026
Merged

refactor: introduce QuoteState and ContextStack for unified parsing state#321
ldayton merged 6 commits intomainfrom
context-stack-refactor

Conversation

@ldayton
Copy link
Copy Markdown
Owner

@ldayton ldayton commented Jan 16, 2026

Summary

Incrementally introduces bash's architectural patterns for cleaner parsing state management:

  • QuoteState class: Unified quote tracking with stack support for nested contexts. Replaces ~80 scattered in_single/in_double variable instances across ~15 functions.

  • ParseContext & ContextStack: Infrastructure for tracking nested parsing scopes (command substitutions, arithmetic, case patterns, brace expansions). Parser now has self._ctx for context-aware parsing.

  • _is_valid_arithmetic_start() helper: Extracts arithmetic validation logic to distinguish $((...)) from $( ( ... ) ).

  • Standard lookahead helpers: Adds peek_at(), lookahead(), and match_keyword() to Parser for cleaner boundary checks.

  • Style checker enhancement: Adds import statement checks (imports not allowed in self-contained transpiled code).

All 4515 tests pass in both Python and JavaScript.

Adds QuoteState class that encapsulates single/double quote state
tracking with stack support for nested contexts. Refactors ~15
functions to use this unified tracker instead of scattered in_single,
in_double, in_single_quote, in_double_quote variables.

Key improvements:
- Single source of truth for quote state via QuoteState class
- push()/pop() methods for nested contexts (e.g., ${...} inside quotes)
- outer_double() method to peek at parent context
- in_quotes() helper to check any quote state
- process_char() for standard quote character handling

Functions refactored to use QuoteState:
- _strip_line_continuations_comment_aware
- _find_cmdsub_end
- Word._double_ctlesc_smart
- Word._normalize_param_expansion_newlines
- Word._expand_all_ansi_c_quotes
- Word._format_command_substitutions
- Word._normalize_extglob_whitespace
- Parser._is_assignment_word
- Parser._param_subscript_has_close
- Parser._consume_param_name (subscript handling)
- And several nested quote tracking contexts

This is Phase 1 of a larger refactoring toward bash's parser model.
Adds ParseContext and ContextStack classes to provide infrastructure
for managing nested parsing contexts. This replaces scattered state
variables with an explicit stack-based model.

ParseContext tracks:
- Context kind (NORMAL, COMMAND_SUB, ARITHMETIC, CASE_PATTERN, BRACE_EXPANSION)
- Paren/brace/bracket depths
- Quote state (via QuoteState)

ContextStack provides:
- push(kind): Enter a new context
- pop(): Exit current context (never pops base)
- current: Access topmost context
- in_context(kind): Check if context type is on stack
- depth: Current stack depth

The Parser now has self._ctx (ContextStack) for tracking parsing context.
This infrastructure enables incremental migration of scattered state
variables like case_depth, arith_depth, etc. to the context stack model.

This is Phase 2 of the architectural refactoring toward bash's parser model.
Extracts the arithmetic validation logic from _find_cmdsub_end into a
standalone helper function _is_valid_arithmetic_start(). This:

- Checks if $(( at a position starts valid arithmetic expression
- Scans forward looking for )) at top paren level (excluding nested $())
- Returns True for arithmetic, False for $( ( ... ) ) (cmdsub + subshell)

The helper makes the code more readable and documents the pattern for
distinguishing $((...)) arithmetic from $( ( ... ) ) command substitution.

This is Phase 3 of the architectural refactoring toward bash's parser model.
Adds three standard lookahead methods to Parser class:

- peek_at(offset): Peek at character at offset from current position,
  returns empty string if out of bounds
- lookahead(n): Return next n characters without consuming
- match_keyword(keyword): Check if current position matches keyword
  with word boundary

These helpers replace ad-hoc patterns like:
  self.pos + 1 < self.length and self.source[self.pos + 1] == "\n"
with cleaner:
  self.peek_at(1) == "\n"

Refactored skip_whitespace and skip_whitespace_and_newlines to
demonstrate the new peek_at helper usage.

This is Phase 5 of the architectural refactoring toward bash's parser model.
The transpiler doesn't support @Property decorators, so convert:
- QuoteState.depth -> QuoteState.get_depth()
- ContextStack.current -> ContextStack.get_current()
- ContextStack.depth -> ContextStack.get_depth()
The transpiler produces self-contained code, so imports are not allowed.
Adds checks for both 'import x' and 'from x import y' statements.
@ldayton ldayton enabled auto-merge (squash) January 16, 2026 23:40
@ldayton ldayton merged commit 7d34259 into main Jan 16, 2026
1 check passed
@ldayton ldayton deleted the context-stack-refactor branch January 16, 2026 23:59
ldayton added a commit that referenced this pull request Mar 25, 2026
Includes fixes for JS backend: constructor defaults (#321), startswith
pos arg (#324), operator precedence (#333), regex escaping (#322),
template literal backticks (#323), destructuring discard (#326),
isinstance primitives (#325, #327), backtick-heredoc (#352), and
UTF-8 encoding (#334).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant