Skip to content

fix: remove run_id from trending cache key and enforce via compiler validation#28387

Merged
pelikhan merged 4 commits intomainfrom
copilot/fix-remove-run-id-from-cache-key
Apr 25, 2026
Merged

fix: remove run_id from trending cache key and enforce via compiler validation#28387
pelikhan merged 4 commits intomainfrom
copilot/fix-remove-run-id-from-cache-key

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 25, 2026

The run_id suffix in cache-memory keys was causing every run to write to a unique cache slot, making cross-run cache restoration impossible. The result: history.jsonl appeared to accumulate only one entry per run, with every run reporting "This is the first day of tracking".

Changes

Compiler validation (pkg/workflow/cache_validation.go, pkg/workflow/cache.go)

  • Added validateNoCacheKeyRunID — returns a compilation error when a user-supplied cache-memory key contains the ${{ github.run_id }} expression
  • Uses a word-boundary regex (github\.run_id(?:[^_\w]|$)) to avoid false positives on names like github.run_identifier
  • Called from parseCacheMemoryEntry before a custom key is accepted
  • Added cache_key_validation_test.go with table-driven tests for both object and array notation, including false-positive cases

Workflow fixes — all affected workflows

Removed -${{ github.run_id }} from cache-memory keys in every workflow that had the same bug:

  • shared/trending-charts-simple.md
  • shared/charts-with-trending.md
  • cloclo.md
  • poem-bot.md
  • stale-repo-identifier.md

The compiler appends run_id automatically to the save key and generates a stable restore-keys prefix — users must not add it themselves.

api-consumption-report.md

  • Adds a cache validation bash snippet at the start of Step 3 so the agent explicitly logs whether the cache was restored from a previous run (first-run vs. cache-hit)
  • Adds a Cache restored from previous run field to the Cache Memory Status section in the discussion template

Copilot AI changed the title [WIP] Fix trending cache key by removing run_id for restoration fix: remove run_id from trending cache key to enable cross-run cache restoration Apr 25, 2026
Copilot AI requested a review from pelikhan April 25, 2026 02:53

tools:
cache-memory:
key: trending-data-${{ env.GH_AW_WORKFLOW_ID_SANITIZED }}-${{ github.run_id }}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot the compiler should detect the github.run_id expression in the cache-memory key and raise a compilation error

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 9baf3b1. parseCacheMemoryEntry now calls validateNoCacheKeyRunID which uses a word-boundary regex (github\.run_id(?:[^_\w]|$)) so github.run_identifier won't false-positive. The 4 other workflow files that had the same bug (cloclo.md, poem-bot.md, stale-repo-identifier.md, shared/charts-with-trending.md) are also fixed — all 201 workflows recompile cleanly.

Copilot AI and others added 2 commits April 25, 2026 03:04
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 25, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/graphql
    • Triggering command: /usr/bin/gh /usr/bin/gh api graphql -f query=query($owner: String!, $name: String!) { repository(owner: $owner, name: $name) { hasDiscussionsEnabled } } -f owner=github -f name=gh-aw GOMOD GOMODCACHE 5963323/b392/impGO111MODULE -c 5963323/b392/embGOINSECURE GOPROXY 64/bin/go GOSUMDB GOWORK 64/bin/go /opt/hostedtoolcGO111MODULE (http block)
    • Triggering command: /usr/bin/gh /usr/bin/gh api graphql -f query=query($owner: String!, $name: String!) { repository(owner: $owner, name: $name) { hasDiscussionsEnabled } } -f owner=github -f name=gh-aw GOMOD GOMODCACHE 5963323/b390/impGO111MODULE -c k/gh-aw/gh-aw/pkGOINSECURE GOPROXY 64/bin/go GOSUMDB GOWORK 64/bin/go /opt/hostedtoolcGO111MODULE (http block)
    • Triggering command: /usr/bin/gh /usr/bin/gh api graphql -f query=query($owner: String!, $name: String!) { repository(owner: $owner, name: $name) { hasDiscussionsEnabled } } -f owner=github -f name=gh-aw GOMOD GOMODCACHE 5963323/b404/impGO111MODULE -c che/go-build/8a/GOINSECURE GOPROXY 64/bin/go GOSUMDB GOWORK 64/bin/go /opt/hostedtoolcGO111MODULE (http block)
  • https://api.github.com/repos/astral-sh/setup-uv/git/ref/tags/eac588ad8def6316056a12d4907a9d4d84ff7a3b
    • Triggering command: /usr/bin/gh gh api /repos/astral-sh/setup-uv/git/ref/tags/eac588ad8def6316056a12d4907a9d4d84ff7a3b --jq [.object.sha, .object.type] | @tsv T2vH/eoTNS_w1OuuGOINSECURE GO111MODULE 64/bin/go GOINSECURE GOMOD GOMODCACHE 5963323/b422/impGO111MODULE -c k/gh-aw/gh-aw/pkGOINSECURE k/gh-aw/gh-aw/pkGOMOD 64/bin/go GOSUMDB GOWORK 64/bin/go /opt/hostedtoolcGOPROXY (http block)
    • Triggering command: /usr/bin/gh gh api /repos/astral-sh/setup-uv/git/ref/tags/eac588ad8def6316056a12d4907a9d4d84ff7a3b --jq [.object.sha, .object.type] | @tsv prettier --check 64/bin/go --ignore-path .prettierignore 64/bin/go go env -json GO111MODULE 64/bin/go GOINSECURE GOMOD GOMODCACHE go (http block)
    • Triggering command: /usr/bin/gh gh api /repos/astral-sh/setup-uv/git/ref/tags/eac588ad8def6316056a12d4907a9d4d84ff7a3b --jq [.object.sha, .object.type] | @tsv "prettier" --cheGOINSECURE git $name) { hasDiscussionsEnabled } } --show-toplevel (http block)
  • https://api.github.com/repos/github/gh-aw
    • Triggering command: /usr/bin/gh gh api /repos/github/gh-aw --jq .default_branch INVALID,NEW -j DROP GOINSECURE GOMOD GOMODCACHE go env -json GO111MODULE ache/go/1.25.8/x64/bin/go GOINSECURE GOMOD GOMODCACHE go (http block)
    • Triggering command: /usr/bin/gh gh api /repos/github/gh-aw --jq .default_branch ithub/workflows log r: $owner, name: $name) { hasDiscussionsEnabled } } -n1 --format=format:-C DiscussionsEnabl/home/REDACTED/work/gh-aw/gh-aw/.github/workflows git -C ithub/workflows rev-parse /usr/bin/git l GO111MODULE 64/bin/go git (http block)
    • Triggering command: /usr/bin/gh gh api /repos/github/gh-aw --jq .default_branch k/gh-aw/gh-aw/.github/workflows show porary-id.lock.yml l :latest kflows/copilot-p/home/REDACTED/work/gh-aw/gh-aw/.github/workflows infocmp -1 k/gh-aw/gh-aw/.gremote.origin.url /bin/sh /home/REDACTED/.dotnet/tools/bash ithub/workflows GOPROXY l bash (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI changed the title fix: remove run_id from trending cache key to enable cross-run cache restoration fix: remove run_id from trending cache key and enforce via compiler validation Apr 25, 2026
Copilot AI requested a review from pelikhan April 25, 2026 03:10
@pelikhan pelikhan marked this pull request as ready for review April 25, 2026 03:13
Copilot AI review requested due to automatic review settings April 25, 2026 03:13
@pelikhan pelikhan merged commit 9997624 into main Apr 25, 2026
19 checks passed
@pelikhan pelikhan deleted the copilot/fix-remove-run-id-from-cache-key branch April 25, 2026 03:13
@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 80/100

Excellent test quality

Metric Value
New/modified tests analyzed 2
✅ Design tests (behavioral contracts) 2 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 2 (100%)
Duplicate test clusters 0
Test inflation detected ⚠️ Yes (126 test lines vs 32 production lines, ratio ≈ 3.9:1)
🚨 Coding-guideline violations None

Test Classification Details

View test classification table
Test File Classification Issues Detected
TestCacheKeyRunIDValidationObject pkg/workflow/cache_key_validation_test.go ✅ Design None — 6 sub-cases including happy path, error cases, and false-positive guards
TestCacheKeyRunIDValidationArray pkg/workflow/cache_key_validation_test.go ✅ Design None — covers both happy path and error path for array notation

Flagged Tests — Requires Review

No tests were flagged for behavioral issues.

Minor note — Test Inflation (informational, not a blocker):

The test file adds 126 lines against 32 lines of new production code (ratio ≈ 3.9:1, exceeding the 2:1 threshold). This is expected for table-driven tests and does not represent low-quality testing — the additional lines are valuable test-case rows, not redundant boilerplate. The 10-point inflation penalty in the score reflects the rubric's mechanical threshold rather than a real quality concern.


Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 2 tests — unit (//go:build !integration)
  • 🟨 JavaScript (*.test.cjs, *.test.js): 0 tests

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). Both test functions are well-structured table-driven behavioral tests that verify the compiler correctly rejects github.run_id in cache keys, with explicit error-path coverage and false-positive guards for similar-looking variable names.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.


References: §24921258553

🧪 Test quality analysis by Test Quality Sentinel · ● 384.4K ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 80/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). Both table-driven tests verify behavioral contracts with full error-path and false-positive coverage.

@github-actions
Copy link
Copy Markdown
Contributor

🏗️ Design Decision Gate — ADR Required

This PR makes significant changes to core business logic (158 new lines in pkg/workflow/) but was merged without a linked Architecture Decision Record (ADR).

AI has analyzed the PR diff and generated a draft ADR to help document this decision retroactively:

📄 Draft ADR: docs/adr/28387-enforce-stable-cache-memory-keys-via-compiler-validation.md

Note: Because this PR was already merged when the gate ran, the draft ADR could not be pushed to the PR branch automatically. Please commit the file below to main directly (or via a follow-up PR).


📋 Draft ADR content — copy and save to docs/adr/28387-enforce-stable-cache-memory-keys-via-compiler-validation.md
# ADR-28387: Enforce Stable Cache-Memory Keys via Compiler Validation

**Date**: 2026-04-25
**Status**: Draft
**Deciders**: pelikhan, Copilot

---

## Part 1 — Narrative (Human-Friendly)

### Context

The gh-aw workflow compiler appends `-$\{\{ github.run_id }}` automatically to
the *save* key for `cache-memory` entries while generating a stable
`restore-keys` prefix. This design allows cache slots to be written per-run
but restored across runs. Several production workflows mistakenly included
`$\{\{ github.run_id }}` in their user-supplied `key` field as well (e.g.,
`trending-data-$\{\{ env.GH_AW_WORKFLOW_ID_SANITIZED }}-$\{\{ github.run_id }}`),
causing every run to write to a unique cache slot with no overlap. The
practical effect was that `history.jsonl` accumulated only one entry per run,
and every run reported "This is the first day of tracking" — a silent
correctness failure with no error at compile or run time.

### Decision

We will enforce at compile time that user-supplied `cache-memory` keys must
not reference `github.run_id`. The `validateNoCacheKeyRunID` function is
called from `parseCacheMemoryEntry` before a custom key is accepted; if the
key matches the pattern `github\.run_id(?:[^_\w]|$)`, compilation fails with
an actionable error message explaining the invariant and the correct key
format. All five affected workflows were updated to remove the redundant
suffix.

### Alternatives Considered

#### Alternative 1: Runtime Detection

Detect the bad key at workflow runtime — for example, by checking whether
`history.jsonl` has only one entry after a cache-restore attempt and emitting
a warning. This was rejected because errors surface only after a full workflow
run completes (potentially minutes later), and the signal is indirect — a
near-empty history file could be caused by other factors. Compile-time
enforcement is instantaneous, deterministic, and prevents the workflow from
ever running in a broken state.

#### Alternative 2: Documentation and Convention Only

Document that `github.run_id` should not appear in user-supplied cache keys
without adding compiler enforcement. This was rejected because the bug had
already appeared independently in five workflows, demonstrating that
convention-only guidance fails in practice. The compiler already has a
validation layer (`cache_validation.go`) established by prior ADRs, making a
new validation function the natural and low-cost extension.

### Consequences

#### Positive
- Cache keys are guaranteed to be stable across runs, enabling cross-run
  history accumulation.
- An entire class of silent cache-busting bugs is eliminated at the source —
  the error is shown before any code runs.
- The error message is self-documenting: it explains the invariant and shows
  a corrected example key.

#### Negative
- Any existing workflow that includes `github.run_id` in a user-supplied
  cache key will now fail to compile. Authors must migrate their keys before
  the workflow can run.
- The word-boundary regex (`github\.run_id(?:[^_\w]|$)`) requires careful
  maintenance if the compiler's cache-key expansion logic changes in future.

#### Neutral
- A new test file (`cache_key_validation_test.go`) was added with
  table-driven tests covering both object and array notation, including
  false-positive cases (e.g., `github.run_identifier` must not trigger the
  error).
- The compiler already appends `run_id` to the save key; this ADR makes that
  existing behaviour an enforced invariant rather than an implicit
  expectation.

---

## Part 2 — Normative Specification (RFC 2119)

> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**,
> **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in
> this section are to be interpreted as described in
> [RFC 2119]((www.rfceditor.org/redacted)

### Cache Key Authoring

1. User-supplied `cache-memory` key values **MUST NOT** reference the
   `github.run_id` expression (i.e., the string `github.run_id` followed by
   a non-word character or end-of-string).
2. Cache keys **MUST** be stable across workflow runs so that the
   compiler-generated `restore-keys` prefix can locate a prior cache slot.
3. Cache key authors **SHOULD** use workflow-scoped variables (e.g.,
   `$\{\{ env.GH_AW_WORKFLOW_ID_SANITIZED }}`, `$\{\{ github.workflow }}`) to
   ensure uniqueness without per-run entropy.

### Compiler Enforcement

1. The compiler **MUST** invoke `validateNoCacheKeyRunID` on every
   user-supplied cache key string before accepting it.
2. If the key matches the prohibited pattern, the compiler **MUST** return a
   `ValidationError` that includes: (a) the offending field path, (b) the
   offending value, (c) a human-readable explanation of the invariant, and
   (d) a corrected example.
3. The validation regex **MUST** use a word-boundary check
   (`(?:[^_\w]|$)`) to avoid false positives on identifiers such as
   `github.run_identifier` or `github.run_id_backup`.
4. The compiler **MUST NOT** silently strip or rewrite a user-supplied key
   that contains `github.run_id`; it **MUST** fail with an error.

### Testing

1. New validation functions in `cache_validation.go` **MUST** be accompanied
   by table-driven unit tests covering at least: (a) valid keys that are
   accepted, (b) keys containing `github.run_id` that are rejected, and
   (c) keys containing identifiers that share a prefix with `github.run_id`
   to verify no false positives.
2. Tests **MUST** exercise both object notation (`cache-memory: key: ...`)
   and array notation (`cache-memory: [{key: ...}]`).

### Conformance

An implementation is considered conformant with this ADR if it satisfies all
**MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or
**MUST NOT** requirement constitutes non-conformance.

---

*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/24921258566) workflow. Review and finalize before changing status from Draft to Accepted.*

What to do next

  1. Save the draft above to docs/adr/28387-enforce-stable-cache-memory-keys-via-compiler-validation.md and commit it to main
  2. Review the draft — fill in any context the AI couldn't infer, refine the decision rationale, and confirm the listed alternatives are accurate
  3. Once finalized, update Status from Draft to Accepted

Why this PR triggered the gate

Check Result
implementation label Not present
New lines in pkg/ (business logic) 158 lines (threshold: 100)
ADR linked in PR body ❌ Not found

Why ADRs Matter

ADRs create a searchable, permanent record of why the codebase looks the way it does. This PR introduces a new compiler-level invariant — a future contributor who sees validateNoCacheKeyRunID and wonders "why does this exist?" should find the answer in docs/adr/.


📋 Michael Nygard ADR Format Reference

An ADR must contain these four sections to be considered complete:

  • Context — What is the problem? What forces are at play?
  • Decision — What did you decide? Why?
  • Alternatives Considered — What else could have been done?
  • Consequences — What are the trade-offs (positive and negative)?

All ADRs are stored in docs/adr/ as Markdown files numbered by PR number (e.g., 28387-kebab-title.md for PR #28387).

References: §24921258566

🏗️ ADR gate enforced by Design Decision Gate 🏗️ · ● 194.4K ·

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes cache-memory key stability across workflow runs by removing user-specified ${{ github.run_id }} from cache-memory keys and enforcing that constraint at compile time, enabling cross-run cache restores.

Changes:

  • Added compiler validation to reject user-supplied tools.cache-memory.key values that reference github.run_id, plus table-driven unit tests.
  • Updated cache-memory key parsing to invoke the new validation before accepting custom keys.
  • Removed -${{ github.run_id }} from affected workflow frontmatter cache-memory keys and updated documentation to log cache restore status.
Show a summary per file
File Description
pkg/workflow/cache_validation.go Adds validateNoCacheKeyRunID and regex to detect github.run_id usage in cache-memory keys.
pkg/workflow/cache_key_validation_test.go Adds unit tests covering object/array notation and false-positive cases.
pkg/workflow/cache.go Calls validateNoCacheKeyRunID when parsing a custom cache-memory key.
.github/workflows/shared/trending-charts-simple.md Removes ${{ github.run_id }} from cache-memory key to make it stable across runs.
.github/workflows/shared/charts-with-trending.md Removes ${{ github.run_id }} from cache-memory key to make it stable across runs.
.github/workflows/cloclo.md Removes ${{ github.run_id }} from cache-memory key to make it stable across runs.
.github/workflows/poem-bot.md Removes ${{ github.run_id }} from cache-memory key to make it stable across runs.
.github/workflows/stale-repo-identifier.md Removes ${{ github.run_id }} from cache-memory key to make it stable across runs.
.github/workflows/api-consumption-report.md Adds a bash snippet and template field to explicitly report whether cache was restored.
.github/workflows/stale-repo-identifier.lock.yml Regenerated lock file corresponding to workflow frontmatter changes.
.github/workflows/python-data-charts.lock.yml Regenerated lock file corresponding to shared workflow frontmatter changes.
.github/workflows/poem-bot.lock.yml Regenerated lock file corresponding to workflow frontmatter changes.
.github/workflows/cloclo.lock.yml Regenerated lock file corresponding to workflow frontmatter changes.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 13/13 changed files
  • Comments generated: 1

Comment on lines 9 to 13
// # Validation Functions
//
// - validateNoDuplicateCacheIDs() - Ensures each cache entry has a unique ID
// - validateNoCacheKeyRunID() - Rejects cache keys that reference github.run_id
//
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The top-of-file comment describes this file as validating sandbox cache-memory config, but this file now also contains validation used for tools.cache-memory keys (via parseCacheMemoryEntry). Consider updating the file-level documentation/section headings (or relocating the tool-specific validation) so the docs accurately reflect current usage.

See below for a potential fix:

// This file provides validation for cache-memory configuration.
//
// # Cache Memory Validation
//
// This file contains validation shared across cache-memory configuration,
// including:
//   - sandbox.cache-memory entries, such as duplicate ID checks
//   - tools.cache-memory keys, such as rejecting github.run_id references
//
// These validations prevent configuration mistakes that would otherwise cause
// runtime conflicts or ineffective cache behavior.
//
// # Validation Functions
//
//   - validateNoDuplicateCacheIDs() - Ensures each sandbox cache entry has a unique ID
//   - validateNoCacheKeyRunID() - Rejects tools.cache-memory keys that reference github.run_id
//
// # When to Add Validation Here
//
// Add validation to this file when:
//   - Adding new sandbox.cache-memory or tools.cache-memory constraints

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[q] fix: remove run_id from trending cache key to enable cross-run cache restoration

3 participants