First PowerShell parser. ShellSyntaxTree now ships two IShellParser
implementations — BashParser (unchanged) and the new PwshParser — both
emitting the same ParsedCommand AST a consumer already walks for bash.
Shipped as an alpha prerelease so Netclaw can validate the new parser
and the breaking Clause rename before promotion to a stable 0.2.0.
BREAKING: Clause.IsBashCWrapped renamed to Clause.IsCommandStringWrapped
The v0.1 field Clause.IsBashCWrapped is renamed Clause.IsCommandStringWrapped.
The meaning is unchanged and now shell-neutral — true when the clause is the
result of recursing into a command-string wrapper: bash bash -c "..." /
sh -c "...", or PowerShell pwsh -Command "..." / pwsh -EncodedCommand ....
| Old (v0.1) | New (v0.2.0) |
|---|---|
Clause.IsBashCWrapped |
Clause.IsCommandStringWrapped |
A breaking AST change on a 0.x minor is permitted by SPEC.md Appendix A
when RELEASE_NOTES.md carries the old→new mapping (above) and Netclaw is
updated in lockstep. Consumers: rename every IsBashCWrapped reference;
there is no behavior change beyond the identifier.
BREAKING (source-compatible): BashParserOptions reparented
BashParserOptions is now a sealed record deriving from the new abstract
ShellParserOptions base; HomeDirectory / WorkingDirectory move to the
base. The object-initializer shape is unchanged —
new BashParserOptions { HomeDirectory = ..., WorkingDirectory = ... }
still compiles. Only code that named BashParserOptions as a base type
or reflected over its declared members is affected.
New public surface
PwshParser : IShellParser— the PowerShell parser.ParsethrowsArgumentNullExceptionon null and never throws on a well-formed string, exactly likeBashParser.PwshParserOptions— configuration record forPwshParser(empty in v0.2.0; resolver knobs live onShellParserOptions).ShellParserOptions— the shared, abstract resolver-configuration base.VerbChain.CanonicalVerb(additive) — the alias-resolved canonical verb, non-null only when an alias was rewritten (ls→Get-ChildItem). Null for every bash clause. Consumers gate onCanonicalVerb ?? Tokens[0].VerbChain.IsDynamic(additive) — true when the command name is a dynamic token the parser cannot statically identify (& $exe,& { ... }). Always false for bash clauses; a consumer MUST route a dynamic clause to safe-fail.
PowerShell parser capabilities (SPEC.POWERSHELL.md)
- Parses PowerShell command pipelines into the shared
ParsedCommandAST — per-clause verbs, args, parameters, redirects, and the&&/||/;/|/ newline compound operators. - Recognizes cmdlets (
Verb-Noun), native commands, and the complete built-in alias set; resolves aliases to their canonical cmdlet while preserving the verbatim typed token. - The §6.5 parameter-binding model — switch vs. value-binding decisions
from static tables, colon-form
-Name:value, prefix matching. - Per-cmdlet / per-parameter path-arg extraction (
-Path,-LiteralPath,-Destination, positional rules). Set-Location <dir>; cmdcwd propagation, including through( ... )grouping (PowerShell( )is not a subshell).- Recursion into
pwsh -Command "<inner>",pwsh -c, andpwsh -EncodedCommand <base64>(base64 / UTF-16LE decode, BOM strip), depth-5 capped; inner clauses surface withIsCommandStringWrapped=true. - Marks dynamic-content tokens (
$var, subexpressions, script blocks, splatting, comma-arrays)DynamicSkip; control flow, definitions, and other script-level constructs safe-fail toIsUnparseable=true. - A 64 KiB input cap guards the per-shell-call hot path.
Corpus & validation
- 211 hand-authored PowerShell corpus entries under
Corpus/powershell/, exceeding every SPEC.POWERSHELL.md §13 category minimum. The corpus runner and PII audit are directory-routed by shell. - A real-
pwshvalidation gate (PwshOracleTests) feeds every PowerShell corpus input to[Parser]::ParseInputand enforces the §13 oracle matrix; aPwshAliases-vs-live-Get-Aliascompleteness[Fact]confirms the alias table has no gaps. tools/PwshCorpusTool— the corpus authoring aid (seeTOOLING.md).
Stable promotion of 0.1.5-beta. No code changes from the beta; this release
drops the pre-release suffix now that the newline-as-statement-separator
behavior change (SPEC §4) has been validated against Netclaw's live gate
evaluator. Consumers on 0.1.5-beta can upgrade directly.
See the 0.1.5-beta notes below for the full list of changes in this version.
Newline-as-statement-separator. Public API surface unchanged; the
content of ParsedCommand.Clauses changes for any input that spans
multiple lines. Shipped as a beta prerelease so Netclaw can validate
the AST-shape change against its live gate evaluator before this is
promoted to a stable 0.1.5.
BEHAVIOR CHANGE: a bare newline now separates clauses (SPEC §4)
- A bare newline outside quotes, heredoc bodies, line continuations, and
$()/ backtick substitutions is now a statement separator equivalent to;— the clause after it carriesCompoundOperator.Sequence. Before this releaseBashCommandParseronly split clauses on&&/||/;/|, socmd1\ncmd2parsed to a single clause[cmd1]withcmd2wrongly absorbed as an argument. - The lexer flags the newline-bearing
Whitespacetoken — and the newline after a heredoc terminator — with a new internalIsStatementSeparatorbit;FilterSignificantretains those tokens andSplitIntoSegmentssplits clauses on them. - Consecutive newlines, leading and trailing newlines, and a newline
immediately after a compound operator (
cmd1 &&\ncmd2) all collapse — they never produce an empty clause. - A heredoc followed by a command on the next line now parses to two clauses (previously the heredoc clause and the following command merged into one).
- A control-flow keyword opening a newline-separated clause
(
echo hi\nfor i in 1 2 3) safe-fails toIsUnparseable=true, exactly as it would after;.
Examples that change:
cmd1\ncmd2→ two clauses[cmd1],[cmd2](was one clause[cmd1]withcmd2as an arg).git pull # done\ndotnet build→ two clauses (was one).
Behavior notes
- Public API surface is unchanged (no
PublicApiSnapshotTestsdelta). - SPEC.md updates: §4 grammar (
compound_opincludesNEWLINE, new notes bullet), §5WHITESPACEtokenization, §15 versioning, §16 sequencing note. - Corpus: 11 new entries (139–149) covering newline separation, blank lines, leading/trailing newlines, newline after an operator, newline inside quotes and subshells, line continuation, comment-then-newline, and the control-flow-after-newline safe-fail. Entry 126's note is corrected — newline-as-separator is no longer a pending gap.
- Unit tests: 8 new
BashLexerTestscases + 14 newBashCommandParserTestscases; the stale comment inComment_between_two_statements_preserves_both_clausesis corrected.
Stable promotion of 0.1.4-alpha. No code changes from the alpha; this release
drops the pre-release suffix to signal that the v0.1 public API surface is
considered production-ready for Bash parsing use cases (see SPEC.md §17
acceptance criteria). Consumers on any 0.1.x-alpha can upgrade directly.
See the 0.1.4-alpha notes below for the full list of changes in this version.
Greedy verb-chain extraction. Public API surface (VerbChain, Clause)
unchanged; the content of Clause.Verb.Tokens changes for many inputs.
BEHAVIOR CHANGE: verb-chain length is no longer table-driven (#27)
- The
BashAritystatic lookup table andProbeArity()method have been removed. The parser walks consecutive verb-like Word tokens from the start of each clause, transparently consuming flag-with-value pairs (e.g.git -C /repo), and stops at the first non-verb-like token, the first plain flag, or the first non-Word token. - A token is "verb-like" when its kind is
Word, length 1–64, first character is an ASCII lowercase letter, and remaining characters are in[a-z0-9._-]. The strict allow-list naturally excludes flags, paths (/,\,~), env-var refs ($VAR), URLs (://), globs, numeric tokens, and uppercase user-named identifiers like migration names — without requiring per-case predicate logic. See SPEC §6.1. - For known FILE verbs (
cat,ls,bash,cd,chmod,grep,find, …) the verb chain stops at exactly one token to preserve per-verb positional-arg classification. The flag-with-value consumption still runs sotar -C /pathandcurl -o filestyle values still pick upIsPath=trueviaFlagValueIsPath.
Examples that change:
git push origin main→ verb[git, push, origin, main](was[git, push]).git worktree list(and arbitrary CLI subcommand chains) → fully extracted as[git, worktree, list](was[git, worktree]).freshdesk ticket list --status open→[freshdesk, ticket, list](was[freshdesk]because freshdesk wasn't in the BashArity table).kubectl get pods my-pod→[kubectl, get, pods, my-pod](was[kubectl, get]).aws s3 cp src dst→[aws, s3, cp, src, dst](was[aws, s3]).dotnet ef migrations add InitialCreate→[dotnet, ef, migrations, add](was[dotnet, ef]).InitialCreatestays in args because the predicate rejects uppercase first character.cat README→ still[cat](FileVerb carveout preservesIsPathon bare-name targets).echo hello→[echo, hello](echo is not a FILE verb).
Clause.Verb is now documented as a convenience hint, not a security
contract (SPEC §6.1.1). Consumers needing security-grade verb
identification should pattern-prefix match against the raw token
stream: a command matches an approval pattern P iff the first
len(P.verb_prefix) command tokens equal P.verb_prefix. This punts
depth choice to the consumer and accommodates the parser's deliberate
over-extraction on bare-word args. Auto-proposed patterns should default
to the full extracted verb chain (greedy match): a subsequent variation
re-prompts rather than silently auto-grants.
Behavior notes
- Public API surface is unchanged (no
PublicApiSnapshotTestsdelta). - SPEC.md updates: §3
VerbChain, §4 grammar, §6.1 verb-chain extraction (rewritten end-to-end), new §6.1.1 consumer pattern-matching guidance, §7 flag-with-value note, §12 worked examples, §15 versioning, §16 implementation sequencing. - Corpus: 7 new entries (132–138) pin the issue #27 headline cases;
10 existing entries flipped to the new shape (
04_echo_hello,11_git_push_origin_main,13_git_checkout_dev,17_docker_run_nginx,27_make_install,45_echo_append_log,84_subshell_nested,91_bash_c_simple,96_bash_c_nested_depth_2,100_bash_c_nested_depth_3,130_netclaw_repro_leading_comment_pipeline). - Unit tests: 8 pinned
BashCommandParserTestscases updated to the new expected verb chains.
Bash line comment handling. Public API unchanged.
Fixed
-
Bash line comments are now recognized and skipped (#25).
BashLexertreats#at a word boundary (start of input, or preceded by whitespace, a newline, or any operator) as the start of a comment that runs to the next newline. The comment text is emitted as a new internalBashTokenKind.Commenttoken for source fidelity and is filtered by the parser alongsideWhitespace/Continuation, so it contributes no verb, args, redirects, or flags to any clause. Comment-only input parses toClauses = [],IsUnparseable = false, matching the existing empty-/whitespace-only path. Quoting and escape rules are honored:#inside single or double quotes is literal,#in the interior of an unquoted word (e.g.abc#def) is literal, and\#outside quotes is literal.Before this fix,
# Extract worktree branches\ngit worktree listparsed to a single clause with verb chain[#, Extract]— the comment text leaked into downstream approval prompts and broke approval-state caching in consumers that did asymmetric verb-chain extraction (persistence-time vs. retry-authorization saw different verb sets, causing tool calls to fail after the user had already clicked Approve).
Behavior notes
- Public API surface is unchanged (no
PublicApiSnapshotTestsdelta). - SPEC.md §4 / §5: new "Comment handling" subsection in §5 documents the boundary rules; §4 BNF notes that comments are whitespace-equivalent at the lexer level.
- Corpus: 9 new entries (123–131) pin every case from the issue report, plus the two Netclaw repros (sanitized paths per §14).
- v0.1 still does not treat top-level newlines as statement separators
(SPEC §4 gap, tracked separately in IMPLEMENTATION_PLAN NEXT) — a
comment between two commands on separate lines requires an explicit
;separator to split into two clauses.
Three parser correctness fixes. Public API unchanged.
Fixed
- Single-quoted strings are now literal per SPEC §5 (B2). Previously
echo '$HOME'producedKind=Tildebecause the resolver substituted$HOMEuniformly regardless of quote style. Now the lexer marks single-quotedQuotedStringtokens with the internalIsSingleQuotedflag, and the resolver bypasses tilde /$HOME/$VAR/ glob /filesystem::handling for them.echo '$HOME'staysKind=Literal,Resolved=null;cat '/etc/passwd'still resolves a path. Matches bash semantics. LooksLikePathno longer false-positives on a lone trailing backslash (B3). A double-quoted token like"foo\\"lexes to Valuefoo\; the trailing\is an escape-collapse artifact, not a meaningful path signal. The heuristic now requires a backslash at a non-trailing position. Forward-slash behavior is unchanged —dir/still classifies as a path (trailing/is a meaningful bash directory hint).- Control-flow keyword detection precedes paren-balance (B4).
Previously
case x in a) ;; esacproducedIsUnparseable=truewith reasonunbalanced parens at position Nbecause the)ina)trippedSplitIntoSegmentsbefore the per-clause keyword check could fire. The anomaly pass now scans the token stream for control-flow keywords at verb position (start of input or after&&/||/;/|/() and short-circuits with the helpfulcontrol-flow keyword 'case' is not supported in v0.1reason before downstream checks run. SPEC §11 now pins the full diagnostic precedence order.
Behavior notes
- Public API surface is unchanged (no
PublicApiSnapshotTestsdelta). - SPEC.md §8: new "Step 0: Single-quoted bypass" preamble; LooksLikePath heuristic updated to call out the trailing-backslash carve-out.
- SPEC.md §11: new "Diagnostic precedence" section enumerating the order in which unparseable conditions are checked.
- Corpus entries 104 (
echo 'literal $HOME') and 109 (echo "trailing backslash\\") updated to the corrected outputs. Four new entries (119–122) pin the regression guards: single-quoted absolute paths still resolve,cd dir/still classifies as a path, single-quoted$VARstays literal underrm, andcase x in a) ;; esacnow reports the control-flow keyword reason instead of a paren-balance error.
Bug fix release for v0.1.0-alpha consumers.
Fixed
2>&1fd-dup redirects no longer produce phantom<cwd>/&1file targets. The parser now recognizes POSIX fd-dup / fd-close shorthand (&N,&N-,&-) on redirect targets and carries the raw token verbatim onRedirect.TargetwithRedirect.IsDynamicSkip = true. Existing consumers that already skip redirects withIsDynamicSkip = trueget correct behavior with no code changes. (B1)
Behavior notes
- Public API surface is unchanged.
Redirect.Targetxmldoc and SPEC.md §3 / §4 are clarified to document the fd-dup rule. - The Blazor sample's basename-startswith-
&workaround has been removed; the sample now relies solely onRedirect.IsDynamicSkip.
First publishable cut of ShellSyntaxTree — a focused .NET library that parses bash command strings into a structured AST for security-gate evaluators. Hand-rolled, AOT-trim friendly, no native dependencies.
What's in this release
IShellParserinterface +BashParserimplementation per locked v0.1 contract (SPEC.md §2 / §3)- Bash lexer: words, quoted strings, operators, opaque substitutions
(
$()/ backticks → DynamicSkip), arithmetic ($((...))) and complex parameter expansion (${var//.../...}) → IsUnparseable - Verb tables (BashArity, CwdVerbs, FileVerbs, FlagsWithValue) + per-verb path-arg rules + flag-with-value-aware verb-chain probe
- Path resolver: tilde /
$HOMEexpansion,filesystem::prefix strip, glob detection (covering-dir heuristic preserved), cross-platform forward-slash normalization - cd-in-compound attribution: synthetic
Arg.IsCwdAttributionpropagated to subsequent clauses;cd $VARproduces a DynamicSkip attribution signal - Subshell isolation via attribution stack with monotonic IDs (handles
sibling subshells
(a) && (b)cleanly) bash -c/sh -crecursion (cap at depth 5 → outerParsedCommand.IsUnparseable=true)- 115-entry corpus across all 11 SPEC §13 categories, validated by
CorpusRunnerTestswith a polishedAstAssert.Equalhelper - PII audit
[Fact]enforcing SPEC §14 sanitization patterns
Public API surface (locked per SPEC §2 / §3)
IShellParser, BashParser, BashParserOptions, ParsedCommand,
Clause, VerbChain, Arg, Redirect (records); ArgKind,
RedirectDirection, CompoundOperator (enums). Multi-target
netstandard2.0;net8.0; AOT-friendly (<IsAotCompatible>true</IsAotCompatible>).
Verification
353 tests passing on Linux + Windows. dotnet pack produces
ShellSyntaxTree.0.1.0-alpha.nupkg with embedded README, icon, and
SourceLink metadata.
Known limitations (tracked for v0.1.x)
pushd/popdparse as CwdVerbs but don't propagate cwd attribution (onlycd/chdirdo in v0.1)tarfalls through to the default per-verb rule (no action-flag awareness)docker -v "/host:/container"is a single literal arg withIsPath=false(no colon-split in v0.1)- Single-quoted
'$HOME'is substituted by the resolver (bash semantics: doesn't substitute in single quotes)
Documentation
SPEC.md— locked v0.1 contract (the source of truth for parser behavior)openspec/changes/— change-proposal history with rationale for the eight v0.1 SPEC interpretations resolved during planningREADME.md— quick-start usage