diff --git a/docs/adr/28744-cache-concurrency-validation-and-toolset-parsing.md b/docs/adr/28744-cache-concurrency-validation-and-toolset-parsing.md new file mode 100644 index 0000000000..350f92b77c --- /dev/null +++ b/docs/adr/28744-cache-concurrency-validation-and-toolset-parsing.md @@ -0,0 +1,75 @@ +# ADR-28744: Cache Concurrency Validation and Toolset Parsing in WorkflowData + +**Date**: 2026-04-27 +**Status**: Draft +**Deciders**: pelikhan, copilot-swe-agent + +--- + +## Part 1 — Narrative (Human-Friendly) + +### Context + +`WorkflowData` is the central data carrier for the compiler pipeline. A prior optimization (ADR-28560) cached permissions parsing and allowed-domain computation in `applyDefaults`, recovering a 21% regression in `BenchmarkCompileSimpleWorkflow`. However, `BenchmarkValidation` was still 119.3% above its historical baseline (250 µs vs. 114 µs) because two additional expensive operations remained uncached in the hot `validateWorkflowData` loop: (1) `validateConcurrencyGroupExpression`, which runs the full `ExpressionParser` (regex scan + tokenize + parse pass) on the auto-generated workflow-level concurrency expression, and (2) `ParseGitHubToolsets`, which was called twice per iteration — once inside `ValidatePermissions` and again in `validateToolConfiguration`. Together these accounted for 47% of all allocations per benchmark iteration. + +### Decision + +We will extend the `WorkflowData` eager-caching pattern established in ADR-28560 to cover concurrency group expression validation and GitHub toolset parsing. In `applyDefaults`, after all mutations are complete, the defer block will: (a) pre-validate the concurrency group expression via `validateConcurrencyGroupExpression` and store the result in `CachedConcurrencyGroupExprErr`, using a boolean sentinel `CachedConcurrencyGroupExprSet` to distinguish a valid `nil` result from "not yet computed"; and (b) call `ParseGitHubToolsets` once and store the result in `CachedParsedToolsets`. `ValidatePermissions` will accept an optional variadic `parsedToolsets ...[]string` parameter so callers can pass the pre-parsed slice directly, eliminating the redundant parse inside that function. Both call sites add a fallback code path that live-computes the value when the cache is not populated, preserving compatibility with `WorkflowData` instances created outside `ParseWorkflowFile`. + +### Alternatives Considered + +#### Alternative 1: Lazy nil-check memoization (same pattern as `CachedPermissions`) + +ADR-28560 used a nil check (`if data.CachedPermissions == nil`) as the sentinel. The same approach would avoid a dedicated boolean flag. This was not chosen because the validation result for a _valid_ concurrency expression is `nil` (no error), making a nil check ambiguous: a nil `CachedConcurrencyGroupExprErr` could mean either "not yet computed" or "computed and valid." A separate boolean `CachedConcurrencyGroupExprSet` is necessary to make the distinction explicit and avoid silently skipping validation when the cache is unpopulated. + +#### Alternative 2: Thread pre-computed values through function parameters + +Passing the pre-parsed toolsets and validation result as explicit parameters to `validateWorkflowData` and its callees would avoid adding new cache fields to `WorkflowData`. This was not chosen because it requires cascading signature changes across multiple callers, increases call-site complexity, and departs from the established pattern in this codebase of using `WorkflowData` as the compilation-scoped state carrier for derived values. + +### Consequences + +#### Positive +- `BenchmarkValidation` drops from ~250 µs to ~2,390 ns/op — approximately 3× faster — and falls well below the 114 µs historical baseline. +- Allocations per operation fall from 40 to 10 (75% reduction); memory per operation falls from 2,080 B to 552 B (73% reduction). +- The variadic `parsedToolsets ...[]string` parameter is fully backward-compatible: existing callers of `ValidatePermissions` require no changes. + +#### Negative +- `WorkflowData` gains two more cache fields (`CachedConcurrencyGroupExprErr`, `CachedConcurrencyGroupExprSet`) plus `CachedParsedToolsets`, expanding the implicit contract that `applyDefaults` must run before any downstream validator reads these fields. +- The boolean sentinel pattern (`CachedConcurrencyGroupExprSet`) is inconsistent with the empty-string sentinel used by `CachedAllowedDomainsStr` (ADR-28560), creating two different caching idioms on the same struct. +- `ValidatePermissions` gains a variadic parameter; while backward-compatible, it increases the function's surface area and makes call-site intent less obvious without documentation. + +#### Neutral +- Both new caching sites add a fallback live-compute path for `WorkflowData` instances created without going through `ParseWorkflowFile`. These fallback paths must be kept in sync with the primary computation logic. +- This PR narrows the scope of `validateConcurrencyGroupExpression` as called from `validateToolConfiguration`: it now only runs as a fallback, never as the primary path for `WorkflowData` produced by `ParseWorkflowFile`. + +--- + +## Part 2 — Normative Specification (RFC 2119) + +> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119). + +### Cache Population + +1. The `applyDefaults` defer block **MUST** populate `CachedConcurrencyGroupExprErr` and set `CachedConcurrencyGroupExprSet = true` after `ConcurrencyGroupExpr` has been extracted, when `ConcurrencyGroupExpr` is non-empty. +2. The `applyDefaults` defer block **MUST** populate `CachedParsedToolsets` when `data.ParsedTools != nil && data.ParsedTools.GitHub != nil`. +3. Cache fields **MUST** be populated in the `applyDefaults` defer block, after all mutations to `WorkflowData` are complete, so that downstream validators see stable, final values. + +### Cache Consumption + +1. `validateToolConfiguration` **MUST** use `CachedConcurrencyGroupExprErr` and `CachedConcurrencyGroupExprSet` when they are available, and **MUST NOT** call `validateConcurrencyGroupExpression` directly when `CachedConcurrencyGroupExprSet` is `true`. +2. `validateToolConfiguration` **MUST** fall back to calling `validateConcurrencyGroupExpression` directly when `CachedConcurrencyGroupExprSet` is `false`, to preserve correctness for `WorkflowData` created outside `ParseWorkflowFile`. +3. `validateToolConfiguration` **MUST** use `CachedParsedToolsets` when it is non-nil, and **MUST** fall back to `ParseGitHubToolsets` when `CachedParsedToolsets` is nil. +4. `ValidatePermissions` **MUST** use the first element of the variadic `parsedToolsets` argument when it is non-nil, and **MUST** fall back to calling `ParseGitHubToolsets(githubTool.GetToolsets())` when no pre-parsed slice is provided or the slice is nil. + +### Sentinel Convention + +1. `CachedConcurrencyGroupExprSet` **MUST** be `true` if and only if `validateConcurrencyGroupExpression` has been called and its result stored in `CachedConcurrencyGroupExprErr`. +2. Implementations **MUST NOT** interpret a nil `CachedConcurrencyGroupExprErr` as evidence that validation has been performed; `CachedConcurrencyGroupExprSet` **MUST** be checked first. + +### Conformance + +An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Specifically: (1) `applyDefaults` populates all three cache fields before any downstream validator runs; (2) `validateToolConfiguration` short-circuits to cached values when available and falls back to live computation otherwise; (3) `ValidatePermissions` respects the variadic toolsets parameter; and (4) the boolean sentinel `CachedConcurrencyGroupExprSet` is always checked before reading `CachedConcurrencyGroupExprErr`. + +--- + +*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/25006605654) workflow. The PR author must review, complete, and finalize this document before the PR can merge.* diff --git a/pkg/workflow/compiler_types.go b/pkg/workflow/compiler_types.go index 1c9d7d6a2f..267db0c76d 100644 --- a/pkg/workflow/compiler_types.go +++ b/pkg/workflow/compiler_types.go @@ -392,126 +392,129 @@ type SkipIfCheckFailingConfig struct { AllowPending bool // if true, pending/in-progress checks are not treated as failing (default: treat pending as failing) } type WorkflowData struct { - Name string - WorkflowID string // workflow identifier derived from markdown filename (basename without extension) - TrialMode bool // whether the workflow is running in trial mode - TrialLogicalRepo string // target repository slug for trial mode (owner/repo) - FrontmatterName string // name field from frontmatter (for code scanning alert driver default) - FrontmatterYAML string // raw frontmatter YAML content (rendered as comment in lock file for reference) - FrontmatterHash string // SHA-256 hash of frontmatter (computed before job building, used to derive stable heredoc delimiters) - RawMarkdown string // raw markdown body before include expansion, used for frontmatter hash computation without re-reading the file - Description string // optional description rendered as comment in lock file - Source string // optional source field (owner/repo@ref/path) rendered as comment in lock file - Redirect string // optional redirect field describing a moved workflow location - TrackerID string // optional tracker identifier for created assets (min 8 chars, alphanumeric + hyphens/underscores) - ImportedFiles []string // list of files imported via imports field (rendered as comment in lock file) - ImportedMarkdown string // Only imports WITH inputs (for compile-time substitution) - ImportPaths []string // Import file paths for runtime-import macro generation (imports without inputs) - MainWorkflowMarkdown string // main workflow markdown without imports (for runtime-import) - IncludedFiles []string // list of files included via @include directives (rendered as comment in lock file) - ImportInputs map[string]any // input values from imports with inputs (for github.aw.inputs.* substitution) - On string - Permissions string - Network string // top-level network permissions configuration - Concurrency string // workflow-level concurrency configuration - RunName string - Env string - EnvSources map[string]string // env var name → source ("(main workflow)" or import file path) for lock file header - If string - TimeoutMinutes string - CustomSteps string - PreSteps string // steps to run at the very start of the agent job, before checkout - PreAgentSteps string // steps to run immediately before the agent execution step - PostSteps string // steps to run after AI execution - RunsOn string - RunsOnSlim string // runner override for all framework/generated jobs (activation, safe-outputs, unlock, etc.) - Environment string // environment setting for the main job - Container string // container setting for the main job - Services string // services setting for the main job - Tools map[string]any - ParsedTools *Tools // Structured tools configuration (NEW: parsed from Tools map) - MarkdownContent string - AI string // "claude" or "codex" (for backwards compatibility) - EngineConfig *EngineConfig // Extended engine configuration - AgentFile string // Path to custom agent file (from imports) - AgentImportSpec string // Original import specification for agent file (e.g., "owner/repo/path@ref") - RepositoryImports []string // Repository-only imports (format: "owner/repo@ref") for .github folder merging - StopTime string - SkipIfMatch *SkipIfMatchConfig // skip-if-match configuration with query and max threshold - SkipIfNoMatch *SkipIfNoMatchConfig // skip-if-no-match configuration with query and min threshold - SkipIfCheckFailing *SkipIfCheckFailingConfig // skip-if-check-failing configuration - SkipRoles []string // roles to skip workflow for (e.g., [admin, maintainer, write]) - SkipBots []string // users to skip workflow for (e.g., [user1, user2]) - OnSteps []map[string]any // steps to inject into the pre-activation job from on.steps - OnPermissions *Permissions // additional permissions for the pre-activation job from on.permissions - OnNeeds []string // custom workflow jobs that pre_activation/activation should depend on from on.needs - ManualApproval string // environment name for manual approval from on: section - Command []string // for /command trigger support - multiple command names - CommandEvents []string // events where command should be active (nil = all events) - CommandOtherEvents map[string]any // for merging command with other events - LabelCommand []string // for label-command trigger support - label names that act as commands - LabelCommandEvents []string // events where label-command should be active (nil = all: issues, pull_request, discussion) - LabelCommandOtherEvents map[string]any // for merging label-command with other events - LabelCommandRemoveLabel bool // whether to automatically remove the triggering label (default: true) - AIReaction string // AI reaction type like "eyes", "heart", etc. - ReactionIssues *bool // whether reactions are allowed on issues/issue_comment triggers (default: true) - ReactionPullRequests *bool // whether reactions are allowed on pull_request/pull_request_review_comment triggers (default: true) - ReactionDiscussions *bool // whether reactions are allowed on discussion/discussion_comment triggers (default: true) - StatusComment *bool // whether to post status comments (default: true when ai-reaction is set, false otherwise) - StatusCommentIssues *bool // whether status comments are allowed on issues/issue_comment triggers (default: true) - StatusCommentPullRequests *bool // whether status comments are allowed on pull_request/pull_request_review_comment triggers (default: true) - StatusCommentDiscussions *bool // whether status comments are allowed on discussion/discussion_comment triggers (default: true) - ActivationGitHubToken string // custom github token from on.github-token for reactions/comments - ActivationGitHubApp *GitHubAppConfig // github app config from on.github-app for minting activation tokens - TopLevelGitHubApp *GitHubAppConfig // top-level github-app fallback for all nested github-app token minting operations - LockForAgent bool // whether to lock the issue during agent workflow execution - Jobs map[string]any // custom job configurations with dependencies - Cache string // cache configuration - NeedsTextOutput bool // whether the workflow uses ${{ needs.task.outputs.text }} - NetworkPermissions *NetworkPermissions // parsed network permissions - SandboxConfig *SandboxConfig // parsed sandbox configuration (AWF or SRT) - SafeOutputs *SafeOutputsConfig // output configuration for automatic output routes - MCPScripts *MCPScriptsConfig // mcp-scripts configuration for custom MCP tools - Roles []string // permission levels required to trigger workflow - Bots []string // allow list of bot identifiers that can trigger workflow - RateLimit *RateLimitConfig // rate limiting configuration for workflow triggers - CacheMemoryConfig *CacheMemoryConfig // parsed cache-memory configuration - RepoMemoryConfig *RepoMemoryConfig // parsed repo-memory configuration - Runtimes map[string]any // runtime version overrides from frontmatter - ToolsTimeout string // timeout for tool/MCP operations: numeric string (seconds) or GitHub Actions expression (empty = use engine default) - ToolsStartupTimeout string // timeout for MCP server startup: numeric string (seconds) or GitHub Actions expression (empty = use engine default) - Features map[string]any // feature flags and configuration options from frontmatter (supports bool and string values) - ActionCache *ActionCache // cache for action pin resolutions - ActionResolver *ActionResolver // resolver for action pins - DockerImages []string // container images collected at compile time (pinned refs when pins are cached) - DockerImagePins []GHAWManifestContainer // full container pin info (image, digest, pinned_image) for manifest - ActionResolutionFailures []GHAWManifestResolutionFailure // unresolved action-ref pinning failures for lock manifest auditing - StrictMode bool // strict mode for action pinning - AllowActionRefs bool // if true, unresolved action refs are warnings instead of errors - SecretMasking *SecretMaskingConfig // secret masking configuration - ParsedFrontmatter *FrontmatterConfig // cached parsed frontmatter configuration (for performance optimization) - RawFrontmatter map[string]any // raw parsed frontmatter map (for passing to hash functions without re-parsing) - OTLPEndpoint string // resolved OTLP endpoint (from observability.otlp.endpoint, including imports; set by injectOTLPConfig) - OTLPHeaders string // normalized OTLP headers in key=value,key=value format (from observability.otlp.headers, including imports; set by injectOTLPConfig) - ResolvedMCPServers map[string]any // fully merged mcp-servers from main workflow and all imports (for mcp inspect) - ActionPinWarnings map[string]bool // cache of already-warned action pin failures (key: "repo@version") - ActionMode ActionMode // action mode for workflow compilation (dev, release, script) - HasExplicitGitHubTool bool // true if tools.github was explicitly configured in frontmatter - InlinedImports bool // if true, inline all imports at compile time (from inlined-imports frontmatter field) - CheckoutConfigs []*CheckoutConfig // user-configured checkout settings from frontmatter - CheckoutDisabled bool // true when checkout: false is set in frontmatter - HasDispatchItemNumber bool // true when workflow_dispatch has item_number input (generated by label trigger shorthand) - ConcurrencyJobDiscriminator string // optional discriminator expression appended to job-level concurrency groups (from concurrency.job-discriminator) - IsDetectionRun bool // true when this WorkflowData is used for inline threat detection (not the main agent run) - UpdateCheckDisabled bool // true when check-for-updates: false is set in frontmatter (disables version check step in activation job) - StaleCheckDisabled bool // true when on.stale-check: false is set in frontmatter (disables frontmatter hash check step in activation job) - EngineConfigSteps []map[string]any // steps returned by engine.RenderConfig — prepended before execution steps - ServicePortExpressions string // comma-separated ${{ job.services[''].ports[''] }} expressions for AWF --allow-host-service-ports - RunInstallScripts bool // true when run-install-scripts: true is set (globally or per node runtime); disables --ignore-scripts on generated npm install steps - CachedPermissions *Permissions // cached parsed Permissions object (for performance optimization); populated by applyDefaults after all permission mutations - ConcurrencyGroupExpr string // cached concurrency group expression extracted from Concurrency YAML (for performance optimization); populated by applyDefaults - CachedAllowedDomainsStr string // cached allowed-domains string for sanitization (for performance optimization); computed once and reused across multiple compilation steps - CachedAllowedDomainsComputed bool // true once CachedAllowedDomainsStr has been set; distinguishes "computed empty" from "not yet computed" + Name string + WorkflowID string // workflow identifier derived from markdown filename (basename without extension) + TrialMode bool // whether the workflow is running in trial mode + TrialLogicalRepo string // target repository slug for trial mode (owner/repo) + FrontmatterName string // name field from frontmatter (for code scanning alert driver default) + FrontmatterYAML string // raw frontmatter YAML content (rendered as comment in lock file for reference) + FrontmatterHash string // SHA-256 hash of frontmatter (computed before job building, used to derive stable heredoc delimiters) + RawMarkdown string // raw markdown body before include expansion, used for frontmatter hash computation without re-reading the file + Description string // optional description rendered as comment in lock file + Source string // optional source field (owner/repo@ref/path) rendered as comment in lock file + Redirect string // optional redirect field describing a moved workflow location + TrackerID string // optional tracker identifier for created assets (min 8 chars, alphanumeric + hyphens/underscores) + ImportedFiles []string // list of files imported via imports field (rendered as comment in lock file) + ImportedMarkdown string // Only imports WITH inputs (for compile-time substitution) + ImportPaths []string // Import file paths for runtime-import macro generation (imports without inputs) + MainWorkflowMarkdown string // main workflow markdown without imports (for runtime-import) + IncludedFiles []string // list of files included via @include directives (rendered as comment in lock file) + ImportInputs map[string]any // input values from imports with inputs (for github.aw.inputs.* substitution) + On string + Permissions string + Network string // top-level network permissions configuration + Concurrency string // workflow-level concurrency configuration + RunName string + Env string + EnvSources map[string]string // env var name → source ("(main workflow)" or import file path) for lock file header + If string + TimeoutMinutes string + CustomSteps string + PreSteps string // steps to run at the very start of the agent job, before checkout + PreAgentSteps string // steps to run immediately before the agent execution step + PostSteps string // steps to run after AI execution + RunsOn string + RunsOnSlim string // runner override for all framework/generated jobs (activation, safe-outputs, unlock, etc.) + Environment string // environment setting for the main job + Container string // container setting for the main job + Services string // services setting for the main job + Tools map[string]any + ParsedTools *Tools // Structured tools configuration (NEW: parsed from Tools map) + MarkdownContent string + AI string // "claude" or "codex" (for backwards compatibility) + EngineConfig *EngineConfig // Extended engine configuration + AgentFile string // Path to custom agent file (from imports) + AgentImportSpec string // Original import specification for agent file (e.g., "owner/repo/path@ref") + RepositoryImports []string // Repository-only imports (format: "owner/repo@ref") for .github folder merging + StopTime string + SkipIfMatch *SkipIfMatchConfig // skip-if-match configuration with query and max threshold + SkipIfNoMatch *SkipIfNoMatchConfig // skip-if-no-match configuration with query and min threshold + SkipIfCheckFailing *SkipIfCheckFailingConfig // skip-if-check-failing configuration + SkipRoles []string // roles to skip workflow for (e.g., [admin, maintainer, write]) + SkipBots []string // users to skip workflow for (e.g., [user1, user2]) + OnSteps []map[string]any // steps to inject into the pre-activation job from on.steps + OnPermissions *Permissions // additional permissions for the pre-activation job from on.permissions + OnNeeds []string // custom workflow jobs that pre_activation/activation should depend on from on.needs + ManualApproval string // environment name for manual approval from on: section + Command []string // for /command trigger support - multiple command names + CommandEvents []string // events where command should be active (nil = all events) + CommandOtherEvents map[string]any // for merging command with other events + LabelCommand []string // for label-command trigger support - label names that act as commands + LabelCommandEvents []string // events where label-command should be active (nil = all: issues, pull_request, discussion) + LabelCommandOtherEvents map[string]any // for merging label-command with other events + LabelCommandRemoveLabel bool // whether to automatically remove the triggering label (default: true) + AIReaction string // AI reaction type like "eyes", "heart", etc. + ReactionIssues *bool // whether reactions are allowed on issues/issue_comment triggers (default: true) + ReactionPullRequests *bool // whether reactions are allowed on pull_request/pull_request_review_comment triggers (default: true) + ReactionDiscussions *bool // whether reactions are allowed on discussion/discussion_comment triggers (default: true) + StatusComment *bool // whether to post status comments (default: true when ai-reaction is set, false otherwise) + StatusCommentIssues *bool // whether status comments are allowed on issues/issue_comment triggers (default: true) + StatusCommentPullRequests *bool // whether status comments are allowed on pull_request/pull_request_review_comment triggers (default: true) + StatusCommentDiscussions *bool // whether status comments are allowed on discussion/discussion_comment triggers (default: true) + ActivationGitHubToken string // custom github token from on.github-token for reactions/comments + ActivationGitHubApp *GitHubAppConfig // github app config from on.github-app for minting activation tokens + TopLevelGitHubApp *GitHubAppConfig // top-level github-app fallback for all nested github-app token minting operations + LockForAgent bool // whether to lock the issue during agent workflow execution + Jobs map[string]any // custom job configurations with dependencies + Cache string // cache configuration + NeedsTextOutput bool // whether the workflow uses ${{ needs.task.outputs.text }} + NetworkPermissions *NetworkPermissions // parsed network permissions + SandboxConfig *SandboxConfig // parsed sandbox configuration (AWF or SRT) + SafeOutputs *SafeOutputsConfig // output configuration for automatic output routes + MCPScripts *MCPScriptsConfig // mcp-scripts configuration for custom MCP tools + Roles []string // permission levels required to trigger workflow + Bots []string // allow list of bot identifiers that can trigger workflow + RateLimit *RateLimitConfig // rate limiting configuration for workflow triggers + CacheMemoryConfig *CacheMemoryConfig // parsed cache-memory configuration + RepoMemoryConfig *RepoMemoryConfig // parsed repo-memory configuration + Runtimes map[string]any // runtime version overrides from frontmatter + ToolsTimeout string // timeout for tool/MCP operations: numeric string (seconds) or GitHub Actions expression (empty = use engine default) + ToolsStartupTimeout string // timeout for MCP server startup: numeric string (seconds) or GitHub Actions expression (empty = use engine default) + Features map[string]any // feature flags and configuration options from frontmatter (supports bool and string values) + ActionCache *ActionCache // cache for action pin resolutions + ActionResolver *ActionResolver // resolver for action pins + DockerImages []string // container images collected at compile time (pinned refs when pins are cached) + DockerImagePins []GHAWManifestContainer // full container pin info (image, digest, pinned_image) for manifest + ActionResolutionFailures []GHAWManifestResolutionFailure // unresolved action-ref pinning failures for lock manifest auditing + StrictMode bool // strict mode for action pinning + AllowActionRefs bool // if true, unresolved action refs are warnings instead of errors + SecretMasking *SecretMaskingConfig // secret masking configuration + ParsedFrontmatter *FrontmatterConfig // cached parsed frontmatter configuration (for performance optimization) + RawFrontmatter map[string]any // raw parsed frontmatter map (for passing to hash functions without re-parsing) + OTLPEndpoint string // resolved OTLP endpoint (from observability.otlp.endpoint, including imports; set by injectOTLPConfig) + OTLPHeaders string // normalized OTLP headers in key=value,key=value format (from observability.otlp.headers, including imports; set by injectOTLPConfig) + ResolvedMCPServers map[string]any // fully merged mcp-servers from main workflow and all imports (for mcp inspect) + ActionPinWarnings map[string]bool // cache of already-warned action pin failures (key: "repo@version") + ActionMode ActionMode // action mode for workflow compilation (dev, release, script) + HasExplicitGitHubTool bool // true if tools.github was explicitly configured in frontmatter + InlinedImports bool // if true, inline all imports at compile time (from inlined-imports frontmatter field) + CheckoutConfigs []*CheckoutConfig // user-configured checkout settings from frontmatter + CheckoutDisabled bool // true when checkout: false is set in frontmatter + HasDispatchItemNumber bool // true when workflow_dispatch has item_number input (generated by label trigger shorthand) + ConcurrencyJobDiscriminator string // optional discriminator expression appended to job-level concurrency groups (from concurrency.job-discriminator) + IsDetectionRun bool // true when this WorkflowData is used for inline threat detection (not the main agent run) + UpdateCheckDisabled bool // true when check-for-updates: false is set in frontmatter (disables version check step in activation job) + StaleCheckDisabled bool // true when on.stale-check: false is set in frontmatter (disables frontmatter hash check step in activation job) + EngineConfigSteps []map[string]any // steps returned by engine.RenderConfig — prepended before execution steps + ServicePortExpressions string // comma-separated ${{ job.services[''].ports[''] }} expressions for AWF --allow-host-service-ports + RunInstallScripts bool // true when run-install-scripts: true is set (globally or per node runtime); disables --ignore-scripts on generated npm install steps + CachedPermissions *Permissions // cached parsed Permissions object (for performance optimization); populated by applyDefaults after all permission mutations + ConcurrencyGroupExpr string // cached concurrency group expression extracted from Concurrency YAML (for performance optimization); populated by applyDefaults + CachedConcurrencyGroupExprErr error // cached result of validateConcurrencyGroupExpression(ConcurrencyGroupExpr); nil = valid; populated by applyDefaults + CachedConcurrencyGroupExprSet bool // true once CachedConcurrencyGroupExprErr has been populated; distinguishes "valid (nil)" from "not yet computed" + CachedParsedToolsets []string // cached result of ParseGitHubToolsets for the GitHub tool (for performance optimization); populated by applyDefaults + CachedAllowedDomainsStr string // cached allowed-domains string for sanitization (for performance optimization); computed once and reused across multiple compilation steps + CachedAllowedDomainsComputed bool // true once CachedAllowedDomainsStr has been set; distinguishes "computed empty" from "not yet computed" } // PinContext returns an actionpins.PinContext backed by this WorkflowData. diff --git a/pkg/workflow/compiler_validators.go b/pkg/workflow/compiler_validators.go index 820163b6e4..d01cc7f098 100644 --- a/pkg/workflow/compiler_validators.go +++ b/pkg/workflow/compiler_validators.go @@ -122,8 +122,10 @@ func (c *Compiler) validatePermissions(workflowData *WorkflowData, markdownPath if hasPermissions && !workflowData.HasExplicitGitHubTool { log.Printf("Skipping permission validation: permissions exist but tools.github not explicitly configured") } else { - // Validate permissions using the typed GitHub tool configuration - validationResult := ValidatePermissions(workflowPermissions, workflowData.ParsedTools.GitHub) + // Validate permissions using the typed GitHub tool configuration. + // Pass the cached parsed toolsets from applyDefaults to avoid a redundant + // ParseGitHubToolsets call inside ValidatePermissions. + validationResult := ValidatePermissions(workflowPermissions, workflowData.ParsedTools.GitHub, workflowData.CachedParsedToolsets) if validationResult.HasValidationIssues { // Format the validation message @@ -254,12 +256,21 @@ func (c *Compiler) validateToolConfiguration(workflowData *WorkflowData, markdow // Validate workflow-level concurrency group expression log.Printf("Validating workflow-level concurrency configuration") if workflowData.Concurrency != "" { - // Use the cached concurrency group expression extracted during applyDefaults to avoid - // repeated regex-based extraction on every validateWorkflowData call. - groupExpr := workflowData.ConcurrencyGroupExpr - if groupExpr != "" { - if err := validateConcurrencyGroupExpression(groupExpr); err != nil { - return formatCompilerError(markdownPath, "error", "workflow-level concurrency validation failed: "+err.Error(), err) + // Use the cached validation result from applyDefaults to avoid re-running the + // expensive ExpressionParser (regex + tokenize + parse) on every validateWorkflowData call. + if workflowData.CachedConcurrencyGroupExprSet { + if workflowData.CachedConcurrencyGroupExprErr != nil { + return formatCompilerError(markdownPath, "error", "workflow-level concurrency validation failed: "+workflowData.CachedConcurrencyGroupExprErr.Error(), workflowData.CachedConcurrencyGroupExprErr) + } + } else { + // Fallback: cache not populated (e.g. WorkflowData created without applyDefaults). + // Extract the group expression directly from Concurrency YAML so validation is not + // skipped for WorkflowData constructed outside of ParseWorkflowFile. + groupExpr := extractConcurrencyGroupFromYAML(workflowData.Concurrency) + if groupExpr != "" { + if err := validateConcurrencyGroupExpression(groupExpr); err != nil { + return formatCompilerError(markdownPath, "error", "workflow-level concurrency validation failed: "+err.Error(), err) + } } } } @@ -365,9 +376,15 @@ func (c *Compiler) validateToolConfiguration(workflowData *WorkflowData, markdow // Validate GitHub tools against enabled toolsets log.Printf("Validating GitHub tools against enabled toolsets") if workflowData.ParsedTools != nil && workflowData.ParsedTools.GitHub != nil { - // Extract allowed tools and enabled toolsets from ParsedTools + // Extract allowed tools and reuse the cached parsed toolsets from applyDefaults to + // avoid a redundant ParseGitHubToolsets call on every validateWorkflowData iteration. allowedTools := workflowData.ParsedTools.GitHub.Allowed.ToStringSlice() - enabledToolsets := ParseGitHubToolsets(strings.Join(workflowData.ParsedTools.GitHub.Toolset.ToStringSlice(), ",")) + var enabledToolsets []string + if workflowData.CachedParsedToolsets != nil { + enabledToolsets = workflowData.CachedParsedToolsets + } else { + enabledToolsets = ParseGitHubToolsets(strings.Join(workflowData.ParsedTools.GitHub.Toolset.ToStringSlice(), ",")) + } // Validate that all allowed tools have their toolsets enabled if err := ValidateGitHubToolsAgainstToolsets(allowedTools, enabledToolsets); err != nil { diff --git a/pkg/workflow/permissions_validation.go b/pkg/workflow/permissions_validation.go index a104f80306..23fbb13bca 100644 --- a/pkg/workflow/permissions_validation.go +++ b/pkg/workflow/permissions_validation.go @@ -26,13 +26,16 @@ type PermissionsValidationResult struct { // Parameters: // - permissions: The workflow's declared permissions // - githubTool: The GitHub tool configuration implementing ValidatableTool interface +// - parsedToolsets: optional pre-parsed toolsets slice; when provided it is used directly +// instead of calling ParseGitHubToolsets(githubTool.GetToolsets()). Pass nil or omit to +// let ValidatePermissions derive the toolsets from the tool configuration. // // Returns: // - A validation result indicating any missing permissions and which toolsets require them // // Use ValidatePermissions (this function) for general permission validation against GitHub MCP toolsets. // Use ValidateIncludedPermissions (in imports.go) when validating permissions from included/imported workflow files. -func ValidatePermissions(permissions *Permissions, githubTool ValidatableTool) *PermissionsValidationResult { +func ValidatePermissions(permissions *Permissions, githubTool ValidatableTool, parsedToolsets ...[]string) *PermissionsValidationResult { permissionsValidationLog.Print("Starting permissions validation") result := &PermissionsValidationResult{ @@ -53,15 +56,20 @@ func ValidatePermissions(permissions *Permissions, githubTool ValidatableTool) * return result } - // Extract toolsets from GitHub tool configuration - toolsetsStr := githubTool.GetToolsets() readOnly := githubTool.IsReadOnly() result.ReadOnlyMode = readOnly - permissionsValidationLog.Printf("Validating toolsets: %s, read-only: %v", toolsetsStr, readOnly) + // Use pre-parsed toolsets when provided (avoids redundant ParseGitHubToolsets calls in hot paths). + var toolsets []string + if len(parsedToolsets) > 0 && parsedToolsets[0] != nil { + toolsets = parsedToolsets[0] + permissionsValidationLog.Printf("Validating with pre-parsed toolsets: %v, read-only: %v", toolsets, readOnly) + } else { + toolsetsStr := githubTool.GetToolsets() + permissionsValidationLog.Printf("Validating toolsets: %s, read-only: %v", toolsetsStr, readOnly) + toolsets = ParseGitHubToolsets(toolsetsStr) + } - // Parse toolsets - toolsets := ParseGitHubToolsets(toolsetsStr) if len(toolsets) == 0 { permissionsValidationLog.Print("No toolsets to validate") return result diff --git a/pkg/workflow/tools.go b/pkg/workflow/tools.go index 229e0709e5..e19d6cfa34 100644 --- a/pkg/workflow/tools.go +++ b/pkg/workflow/tools.go @@ -25,10 +25,24 @@ func (c *Compiler) applyDefaults(data *WorkflowData, markdownPath string) error // applyDefaults is the final stage that mutates data.Permissions (setting defaults and // injecting feature-flag permissions), so the values computed here represent the stable, // final state that validateWorkflowData will use. These caches eliminate repeated - // YAML parsing and regex extraction in the hot validateWorkflowData loop. + // YAML parsing, regex extraction, and expression parsing in the hot validateWorkflowData loop. defer func() { data.CachedPermissions = NewPermissionsParser(data.Permissions).ToPermissions() data.ConcurrencyGroupExpr = extractConcurrencyGroupFromYAML(data.Concurrency) + // Pre-validate and cache the concurrency group expression so validateWorkflowData + // can short-circuit without re-running the expensive ExpressionParser on every call. + // CachedConcurrencyGroupExprSet is always true after applyDefaults regardless of whether + // a group expression exists, so callers can distinguish "already computed" from "not yet computed". + if data.ConcurrencyGroupExpr != "" { + data.CachedConcurrencyGroupExprErr = validateConcurrencyGroupExpression(data.ConcurrencyGroupExpr) + } + data.CachedConcurrencyGroupExprSet = true + // Cache the expanded + parsed toolsets for the GitHub tool so both + // ValidatePermissions and validateToolConfiguration reuse one result. + // Use GetToolsets() to stay aligned with the runtime normalization done by GitHubToolConfig. + if data.ParsedTools != nil && data.ParsedTools.GitHub != nil { + data.CachedParsedToolsets = ParseGitHubToolsets(data.ParsedTools.GitHub.GetToolsets()) + } }() // Check if this is a command trigger workflow (by checking if user specified "on.command")