Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 87 additions & 3 deletions scratchpad/dev.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Developer Instructions

**Version**: 2.8
**Last Updated**: 2026-02-23
**Version**: 2.9
**Last Updated**: 2026-02-24
**Purpose**: Consolidated development guidelines for GitHub Agentic Workflows

This document consolidates specifications from the scratchpad directory into unified developer instructions. It provides architecture patterns, security guidelines, code organization rules, and testing practices.
Expand Down Expand Up @@ -197,7 +197,27 @@ Rationale:
- New engines added without affecting existing ones
- Clear boundaries reduce merge conflicts

**3. Test Organization Pattern**
**3. Engine Interface Architecture**

The engine system implements Interface Segregation Principle (ISP) with 7 focused interfaces composed into a single composite interface (`CodingAgentEngine`):

```
CodingAgentEngine (composite)
├── Engine – core identity (GetID, GetDisplayName, IsExperimental)
├── CapabilityProvider – feature flags (SupportsFirewall, SupportsMaxTurns, ...)
├── WorkflowExecutor – GitHub Actions step generation
├── MCPConfigProvider – MCP server configuration rendering
├── LogParser – log metric extraction
└── SecurityProvider – secret names and detection model
Comment on lines +206 to +211
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section says CodingAgentEngine is composed from “7 focused interfaces”, but the diagram only lists 6 and omits ModelEnvVarProvider (present in pkg/workflow/agentic_engine.go where CodingAgentEngine embeds ModelEnvVarProvider). Also, the Engine interface includes GetDescription() in code, but it’s missing from the “core identity” method list here. Please update the diagram/text to match the current interface set/signatures.

Suggested change
├── Engine – core identity (GetID, GetDisplayName, IsExperimental)
├── CapabilityProvider – feature flags (SupportsFirewall, SupportsMaxTurns, ...)
├── WorkflowExecutor – GitHub Actions step generation
├── MCPConfigProvider – MCP server configuration rendering
├── LogParser – log metric extraction
└── SecurityProvider – secret names and detection model
├── Engine – core identity (GetID, GetDisplayName, GetDescription, IsExperimental)
├── CapabilityProvider – feature flags (SupportsFirewall, SupportsMaxTurns, ...)
├── WorkflowExecutor – GitHub Actions step generation
├── MCPConfigProvider – MCP server configuration rendering
├── LogParser – log metric extraction
├── SecurityProvider – secret names and detection model
└── ModelEnvVarProvider – model environment variable configuration

Copilot uses AI. Check for mistakes.
```

`BaseEngine` provides default implementations for `CapabilityProvider`, `LogParser`, and `SecurityProvider`. New engines embed `BaseEngine` and override only the methods they need to customize.

**Engine Registry**: `EngineRegistry` provides centralized registration, lookup by ID or prefix, and plugin-support validation. Use it rather than direct struct instantiation.

**Adding a new engine**: For the full implementation checklist including interface compliance tests, see `scratchpad/adding-new-engines.md`.

**4. Test Organization Pattern**

Pattern: Tests live alongside implementation with descriptive names

Expand Down Expand Up @@ -1209,6 +1229,39 @@ func SanitizeLabel(label string) string {
}
```

### JavaScript Content Sanitization Pipeline

The JavaScript sanitization module (`actions/setup/js/sanitize_content_core.cjs`) applies a multi-stage pipeline to all incoming content before it is written to GitHub resources:

```
Input Text
▼ hardenUnicodeText()
├─ Unicode normalization (NFC)
├─ HTML entity decoding ← prevents entity-encoded bypass attacks
├─ Zero-width character removal
├─ Bidirectional control removal
└─ Full-width ASCII conversion
▼ ANSI escape sequence removal
▼ neutralizeTemplateDelimiters() ← T24 defense-in-depth
├─ Jinja2/Liquid: {{ }} → \{\{
├─ ERB: <%= %> → \<%=
├─ JS template literals: ${ } → \$\{
└─ Jekyll/Liquid directives: {% %} → \{%
▼ neutralizeMentions()
▼ Output (safe text)
```
Comment on lines +1234 to +1257
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sanitization pipeline diagram/order here doesn’t match the actual sanitizeContentCore() implementation in actions/setup/js/sanitize_content_core.cjs: mentions are neutralized before neutralizeTemplateDelimiters(), and there are additional core stages (e.g., neutralizeCommands, XML comment removal/tag conversion, URL redaction, truncation, GitHub ref/bot-trigger neutralization, markdown code-region balancing). Also, the {% escaping is shown as \{%, but the code replaces {% with \{\% (rendered as \{\%). Please adjust the diagram and accompanying description so the documented order/output matches the code.

Copilot uses AI. Check for mistakes.

**HTML Entity Decoding**: Before @mention detection, all entity variants are decoded—named (`&commat;`), decimal (`&#64;`), hexadecimal (`&#x40;`), and double-encoded (`&amp;#64;`). This prevents attackers from using entity-encoded `@` symbols to trigger unwanted user notifications.

**Template Delimiter Neutralization (T24)**: Template syntax delimiters are escaped as a defense-in-depth measure. GitHub's markdown rendering does not evaluate these patterns, but explicit neutralization documents the defense and protects against future integration scenarios where content might reach a template engine. Logs a warning when patterns are detected.

Both defenses are automatic and apply unconditionally to `sanitizeIncomingText()`, `sanitizeContentCore()`, and `sanitizeContent()`.

### Template Injection Prevention

**Safe Template Evaluation**:
Expand Down Expand Up @@ -1477,6 +1530,32 @@ func routeWorkflow(event Event) (string, error) {
}
```

### Activation Output Transformations

The compiler automatically rewrites three specific `needs.activation.outputs.*` expressions to `steps.sanitized.outputs.*` when they appear inside the activation job itself. A GitHub Actions job cannot reference its own outputs via `needs.<job-name>.*`—those references are only valid in downstream jobs.

**Transformed expressions** (within the activation job only):

| From | To |
|------|----|
| `needs.activation.outputs.text` | `steps.sanitized.outputs.text` |
| `needs.activation.outputs.title` | `steps.sanitized.outputs.title` |
| `needs.activation.outputs.body` | `steps.sanitized.outputs.body` |

**Not transformed** (remain as `needs.activation.outputs.*` since they are consumed by later jobs):
`comment_id`, `comment_repo`, `slash_command`, `issue_locked`

**Why this matters for runtime-import**: When a workflow uses `{{#runtime-import}}` to include an external file at runtime (without recompiling), any new references to `needs.activation.outputs.{text|title|body}` introduced by that file will work correctly because the compiler pre-generates all known expressions and applies the transformation before execution.

**Implementation**: `pkg/workflow/expression_extraction.go::transformActivationOutputs()`
Comment on lines +1548 to +1550
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Why this matters for runtime-import” explanation appears incorrect: imports without inputs are emitted as {{#runtime-import ...}} and their file contents are not parsed for expression extraction/rewrite at compile time (see pkg/workflow/compiler_yaml.go runtime-import generation). Since transformActivationOutputs() only runs during expression extraction, new needs.activation.outputs.{text|title|body} references introduced solely in runtime-imported files won’t be rewritten/substituted. Please reword this to reflect that the rewrite applies to expressions the compiler extracts (or recommend using steps.sanitized.outputs.* directly in runtime-imported content / enable inlined-imports when relying on rewrites).

Suggested change
**Why this matters for runtime-import**: When a workflow uses `{{#runtime-import}}` to include an external file at runtime (without recompiling), any new references to `needs.activation.outputs.{text|title|body}` introduced by that file will work correctly because the compiler pre-generates all known expressions and applies the transformation before execution.
**Implementation**: `pkg/workflow/expression_extraction.go::transformActivationOutputs()`
**Why this matters for runtime-import**: The rewrite only applies to expressions that the compiler extracts at compile time. Imports without inputs are emitted as `{{#runtime-import ...}}`, and their file contents are not parsed for expression extraction or rewrite. New references to `needs.activation.outputs.{text|title|body}` introduced *solely* inside runtime-imported files will therefore **not** be rewritten. When authoring runtime-imported content, either:
- use `steps.sanitized.outputs.{text|title|body}` directly, or
- enable inlined imports for that file so the compiler can see and transform the expressions during extraction.
**Implementation**: `pkg/workflow/expression_extraction.go::transformActivationOutputs()` — this runs during expression extraction over the content known at compile time.

Copilot uses AI. Check for mistakes.

The transformation uses word-boundary checking to prevent partial matches—for example `needs.activation.outputs.text_custom` is not transformed, but `needs.activation.outputs.text` embedded in a larger expression is.

Enable debug logging to trace transformations:
```bash
DEBUG=workflow:expression_extraction gh aw compile workflow.md
```

---

## MCP Integration
Expand Down Expand Up @@ -1925,6 +2004,10 @@ These files are loaded automatically by compatible AI tools (e.g., GitHub Copilo
- [GitHub Actions Security](./github-actions-security-best-practices.md) - Security guidelines
- [Code Organization](./code-organization.md) - Detailed file organization patterns
- [Template Injection Prevention](./template-injection-prevention.md) - Template injection defense patterns
- [Adding New Engines](./adding-new-engines.md) - Step-by-step guide for implementing new agentic engines
- [Activation Output Transformations](./activation-output-transformations.md) - Compiler expression transformation details
- [HTML Entity Mention Bypass Fix](./html-entity-mention-bypass-fix.md) - Security fix: entity-encoded @mention bypass
- [Template Syntax Sanitization](./template-syntax-sanitization.md) - T24: template delimiter neutralization

### External References

Expand All @@ -1936,6 +2019,7 @@ These files are loaded automatically by compatible AI tools (e.g., GitHub Copilo
---

**Document History**:
- v2.9 (2026-02-24): Added Engine Interface Architecture (ISP 7-interface design, BaseEngine, EngineRegistry), JavaScript Content Sanitization Pipeline with HTML entity bypass fix (T24 template delimiter neutralization), and Activation Output Transformations compiler behavior; added 4 new Related Documentation links
- v2.8 (2026-02-23): Documented PR #17769 features: unassign-from-user safe output, blocked deny-list for assign/unassign, standardized error code registry, templatable integer fields, safe outputs prompt template system, XPIA defense policy, MCP template expression escaping, status-comment decoupling, sandbox.agent migration, agent instruction files in .github/agents/
- v2.6 (2026-02-20): Fixed 8 tone issues across 4 spec files, documented post-processing extraction pattern and CLI flag propagation rule from PR #17316, analyzed 61 files
- v2.5 (2026-02-19): Fixed 6 tone issues in engine review docs, added Engine-Specific MCP Config Delivery section (Gemini pattern), analyzed 61 files
Expand Down