From 5e07354aab3e904cb345527652716ad790833116 Mon Sep 17 00:00:00 2001 From: Mara Nikola Kiefer Date: Mon, 27 Apr 2026 10:05:38 +0200 Subject: [PATCH 01/11] docs: Add Organization practices & FeedbackOps --- docs/astro.config.mjs | 12 + .../images/shadow-ops-overview.drawio.svg | 241 ++++++++++++++++++ .../docs/organization-practices/index.mdx | 31 +++ .../organization-practices/safe-rollout.md | 78 ++++++ .../sharing-workflows.md | 66 +++++ .../docs/patterns/central-repo-ops.mdx | 3 - .../content/docs/patterns/correction-ops.md | 187 ++++++++++++++ docs/src/content/docs/reference/glossary.md | 16 ++ 8 files changed, 631 insertions(+), 3 deletions(-) create mode 100644 docs/public/images/shadow-ops-overview.drawio.svg create mode 100644 docs/src/content/docs/organization-practices/index.mdx create mode 100644 docs/src/content/docs/organization-practices/safe-rollout.md create mode 100644 docs/src/content/docs/organization-practices/sharing-workflows.md create mode 100644 docs/src/content/docs/patterns/correction-ops.md diff --git a/docs/astro.config.mjs b/docs/astro.config.mjs index e4acbc20730..b09639fa929 100644 --- a/docs/astro.config.mjs +++ b/docs/astro.config.mjs @@ -129,8 +129,11 @@ export default defineConfig({ '/patterns/researchplanassignops/': '/gh-aw/patterns/research-plan-assign-ops/', '/patterns/batchops/': '/gh-aw/patterns/batch-ops/', '/patterns/taskops/': '/gh-aw/patterns/task-ops/', + '/patterns/shadowops/': '/gh-aw/organization-practices/safe-rollout/', '/patterns/trialops/': '/gh-aw/patterns/trial-ops/', '/patterns/workqueueops/': '/gh-aw/patterns/workqueue-ops/', + '/patterns/shadow-ops/': '/gh-aw/organization-practices/safe-rollout/', + '/organization-practices/shadow-evaluation/': '/gh-aw/organization-practices/safe-rollout/', }, integrations: [ sitemap(), @@ -270,12 +273,21 @@ export default defineConfig({ { label: 'Audit Reports with Agents', link: '/guides/audit-with-agents/' }, ], }, + { + label: 'Organization Practices', + items: [ + { label: 'Overview', link: '/organization-practices/' }, + { label: 'Safe Rollout', link: '/organization-practices/safe-rollout/' }, + { label: 'Sharing Workflows', link: '/organization-practices/sharing-workflows/' }, + ], + }, { label: 'Design Patterns', items: [ { label: 'BatchOps', link: '/patterns/batch-ops/' }, { label: 'CentralRepoOps', link: '/patterns/central-repo-ops/' }, { label: 'ChatOps', link: '/patterns/chat-ops/' }, + { label: 'CorrectionOps', link: '/patterns/correction-ops/' }, { label: 'DailyOps', link: '/patterns/daily-ops/' }, { label: 'DataOps', link: '/patterns/data-ops/' }, { label: 'DispatchOps', link: '/patterns/dispatch-ops/' }, diff --git a/docs/public/images/shadow-ops-overview.drawio.svg b/docs/public/images/shadow-ops-overview.drawio.svg new file mode 100644 index 00000000000..179a3be9410 --- /dev/null +++ b/docs/public/images/shadow-ops-overview.drawio.svg @@ -0,0 +1,241 @@ + + + + + + + + + + + +
+
+
+ + live events + +
+
+
+
+ + live events + +
+
+
+ + + + + + + +
+
+
+ + Production + +
+
+
+
+ + Production + +
+
+
+ + + + + + + + +
+
+
+ + Ops + +
+
+
+
+ + Ops + +
+
+
+ + + + + + + + + + + + + + + +
+
+
+ + Shadow + +
+
+
+
+ + Shadow + +
+
+
+ + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + writes + +
+
+
+
+ + writes + +
+
+
+ + + + + + + + +
+
+
+ + workflow tuning, +
+ reports +
+
+
+
+
+ + workflow tuning,... + +
+
+
+ + + + + + + + + + + + +
+
+
+ + inject bad and rare +
+ valid cases +
+
+
+
+
+ + inject bad and rare... + +
+
+
+ + + + + + + + + + + + + + +
+
+
+ + writes here +
+ when going live +
+
+
+
+
+ + writes here... + +
+
+
+
+ + + + + Text is not SVG - cannot display + + + +
\ No newline at end of file diff --git a/docs/src/content/docs/organization-practices/index.mdx b/docs/src/content/docs/organization-practices/index.mdx new file mode 100644 index 00000000000..0eb91f42e2d --- /dev/null +++ b/docs/src/content/docs/organization-practices/index.mdx @@ -0,0 +1,31 @@ +--- +title: Organization Practices +description: Guidance for adopting, sharing, and governing agentic workflows across teams and repositories. +--- + +Organization Practices collects guidance that matters at team and enterprise scale but does not need to be presented as a standalone design pattern. + +Patterns describe durable workflow shapes such as [CorrectionOps](/gh-aw/patterns/correction-ops/) or [MultiRepoOps](/gh-aw/patterns/multi-repo-ops/). Organization practices cover how those patterns are rolled out, shared, and governed across repositories and teams. + +This section is the right place for topics such as: + +- safe rollout strategies before production writes are enabled +- workflow sharing across repositories and organizations +- centralized ownership models for workflow infrastructure +- platform conventions for versioning, review, and promotion + +## Included Topics + +### Safe Rollout + +[Safe Rollout](/gh-aw/organization-practices/safe-rollout/) describes how to move from report-only or staged behavior to production writes with evidence and control. One technique inside that progression is shadow evaluation, where the workflow writes to a safe non-production target before promotion. + +### Sharing Workflows + +[Sharing Workflows](/gh-aw/organization-practices/sharing-workflows/) describes how workflows can be reused across repositories and organizations. It covers imports, reusable components, central workflow repositories, and when to use templates or starter repositories. + +## Relationship To Other Sections + +- Use [Design Patterns](/gh-aw/patterns/) to learn reusable workflow shapes. +- Use [Guides](/gh-aw/guides/) for task-oriented instructions such as [Reusing Workflows](/gh-aw/guides/packaging-imports/). +- Use [Reference](/gh-aw/reference/) for exact configuration syntax and field behavior. \ No newline at end of file diff --git a/docs/src/content/docs/organization-practices/safe-rollout.md b/docs/src/content/docs/organization-practices/safe-rollout.md new file mode 100644 index 00000000000..d2c5476ea75 --- /dev/null +++ b/docs/src/content/docs/organization-practices/safe-rollout.md @@ -0,0 +1,78 @@ +--- +title: Safe Rollout +description: Move from report-only or staged behavior to direct production writes with evidence and control. +sidebar: + badge: { text: 'Rollout', variant: 'caution' } +--- + +Safe rollout is the practice of increasing workflow autonomy in steps instead of enabling direct production writes immediately. + +The main question is not whether a workflow is useful, but whether it is trusted enough to act on the live system. In practice, teams usually move through a ladder: report-only first, then staged behavior, then a more realistic safe-write technique if needed, and finally direct production writes. + +This is especially useful for [CorrectionOps](/gh-aw/patterns/correction-ops/), where the goal is to improve the workflow over time using persisted predictions and later human truth. + +## Rollout Ladder + +The usual progression is: + +1. Start in report-only mode. +2. Enable `staged` behavior when proposed writes need to be previewed. +3. Use shadow evaluation when preview mode is not enough and the real write path needs to be exercised safely. +4. Promote the same workflow to direct production writes. + +`staged` and shadow evaluation are not interchangeable. Staged mode is sufficient when the question is what the workflow would do. Shadow evaluation is needed when the question is whether the real write path behaves correctly on a safe non-production target. + +## When Staged Is Enough + +Use staged mode when the main risk is decision quality rather than operational behavior. + +It is usually enough when maintainers only need to review proposed actions, compare alternatives, or inspect whether the workflow's judgment is reasonable before any write is allowed. + +## When Shadow Evaluation Is Needed + +Use shadow evaluation when staged mode is too weak because the real write path itself needs validation. + +This is a good fit when: + +- the workflow must update real target objects to prove the behavior is correct +- concurrency, deduplication, or serialization needs to be tested on a live-like surface +- maintainers need to inspect the actual produced state, not only proposed intent +- cross-repository writes, permissions, or dispatch boundaries need to be exercised safely + +Shadow evaluation is one technique inside safe rollout, not a separate top-level pattern. + +## Design Rules + +### Production truth stays authoritative + +Do not let the evaluation surface become the new source of truth. Production events and later trusted human actions should remain authoritative. + +### Prediction snapshots should be explicit + +If later comparison matters, persist what the workflow predicted at decision time. Do not reconstruct predictions from logs. + +### Correction evidence needs provenance + +Not every later edit should count as trustworthy truth. Record provenance such as actor type, manual versus automated source, trust status, and origin repository role. + +### Evaluation surfaces should remain disposable + +Keep the shadow target thin. It should support measurement and rollout, not become a second long-lived control plane. + +## Example Shape + +The common repository split is: + +- production repository: emits live events and contains authoritative later human truth +- ops repository: persists predictions, collects corrections, publishes reports, and updates instructions +- shadow repository: temporary non-production write target during rollout + +That shape is often useful, but it is still rollout guidance rather than a primary pattern. The stronger reusable pattern remains [CorrectionOps](/gh-aw/patterns/correction-ops/). + +## Related Documentation + +- [CorrectionOps](/gh-aw/patterns/correction-ops/) +- [SideRepoOps](/gh-aw/patterns/side-repo-ops/) +- [MultiRepoOps](/gh-aw/patterns/multi-repo-ops/) +- [Staged Mode](/gh-aw/reference/staged-mode/) +- [Safe Outputs Reference](/gh-aw/reference/safe-outputs/) diff --git a/docs/src/content/docs/organization-practices/sharing-workflows.md b/docs/src/content/docs/organization-practices/sharing-workflows.md new file mode 100644 index 00000000000..4e79e9964d2 --- /dev/null +++ b/docs/src/content/docs/organization-practices/sharing-workflows.md @@ -0,0 +1,66 @@ +--- +title: Sharing Workflows +description: Share, reuse, and govern workflows across repositories and organizations. +sidebar: + badge: { text: 'Platform', variant: 'tip' } +--- + +Sharing workflows across repositories is an organization practice, not a single design pattern. + +Some teams want a central repository that publishes common workflows. Others want reusable imports, shared components, or starter repositories that teams can adopt and customize. The right choice depends on how tightly the organization wants to synchronize behavior across repositories. + +## Common Sharing Models + +### Shared source repository + +One repository publishes workflow sources and other repositories add them with `gh aw add` or track them via `source:` metadata. This is a good fit when a platform team owns the workflow and downstream repositories should receive updates over time. + +### Reusable imports and shared components + +Shared imports are useful when the organization wants common building blocks rather than one complete workflow. This works well for common MCP configuration, shared prompts, safety policy, or reusable workflow fragments. + +### Starter repositories and templates + +Starter repositories and workflow templates are useful when teams need a strong starting point but are expected to diverge. This model favors local ownership over synchronized updates. + +### Central ops repositories + +A central operations repository is useful when workflows need durable memory, reporting, review, or organization-wide coordination. In that model, individual product repositories often emit signals while the ops repository owns the long-lived automation loop. + +## Choosing Between Them + +Use a shared source repository when consistency matters more than local autonomy. + +Use imports when the common unit is a capability or policy fragment. + +Use templates when the organization wants fast adoption but expects teams to customize independently. + +Use a central ops repository when workflows need shared history, review queues, reporting, or cross-repository orchestration. + +## Governance Questions + +When workflows are shared across an organization, the important questions are usually operational rather than technical: + +- who owns the source workflow +- how updates are reviewed and promoted +- which repositories may consume or dispatch to shared workflows +- how secrets, permissions, and safe outputs are standardized +- when teams may fork a workflow rather than stay on the shared source + +Those decisions affect reliability more than the file format does. + +## Practical Guidance + +For synchronized reuse, start with [Reusing Workflows](/gh-aw/guides/packaging-imports/), `gh aw add`, and imports. + +For cross-repository control-plane designs, combine this guidance with [SideRepoOps](/gh-aw/patterns/side-repo-ops/) and [MultiRepoOps](/gh-aw/patterns/multi-repo-ops/). + +For organizations introducing workflow sharing gradually, it is common to start with templates or starter repositories, then move stable concerns into imports or a shared source repository once conventions have settled. + +## Related Documentation + +- [Reusing Workflows](/gh-aw/guides/packaging-imports/) +- [Imports Reference](/gh-aw/reference/imports/) +- [SideRepoOps](/gh-aw/patterns/side-repo-ops/) +- [MultiRepoOps](/gh-aw/patterns/multi-repo-ops/) +- [Workflow Structure](/gh-aw/reference/workflow-structure/) diff --git a/docs/src/content/docs/patterns/central-repo-ops.mdx b/docs/src/content/docs/patterns/central-repo-ops.mdx index 690b78cbe29..47282032df4 100644 --- a/docs/src/content/docs/patterns/central-repo-ops.mdx +++ b/docs/src/content/docs/patterns/central-repo-ops.mdx @@ -3,9 +3,6 @@ title: CentralRepoOps description: Operate and roll out changes across many repositories from a single private control repository. --- -> [!WARNING] -> **Experimental:** CentralRepoOps is still experimental! Things may break, change, or be removed without deprecation at any time. - CentralRepoOps is a [MultiRepoOps](/gh-aw/patterns/multi-repo-ops/) deployment variant where a single private repository acts as a control plane for large-scale operations across many repositories. Use this pattern for organization-wide rollouts, phased adoption (pilot waves first), central governance, and security-aware prioritization across tens or hundreds of repositories. Each orchestrator run delivers consistent policy gates, controlled fan-out (`max`), and a complete decision trail — without pushing `main` changes to individual target repositories. diff --git a/docs/src/content/docs/patterns/correction-ops.md b/docs/src/content/docs/patterns/correction-ops.md new file mode 100644 index 00000000000..caa1b22641d --- /dev/null +++ b/docs/src/content/docs/patterns/correction-ops.md @@ -0,0 +1,187 @@ +--- +title: CorrectionOps +description: Improve agentic workflows from trusted human corrections without retraining the underlying model +--- + +CorrectionOps is a workflow pattern that compares predictions with later human corrections. + +Instead of retraining the model, CorrectionOps improves the workflow around the model. It stores predictions at decision time, compares them with later trusted human truth, and uses that evidence to update instructions, routing, thresholds, and rollout decisions. + +The basic loop is simple: + +1. Save what the workflow predicted +2. Collect what humans later decided +3. Use the difference to improve the workflow + +Discussion labelling is a good example: a workflow applies labels, humans later correct those labels, and the system uses that correction evidence to improve future runs. + +## When to Use CorrectionOps + +Use CorrectionOps when you want to turn a human decision process into an agentic workflow iteratively rather than all at once. + +It is a good fit when humans still make or correct the real decision, but you want the workflow to improve over time by updating instructions, routing, thresholds, or rollout state. + +Typical fits include labelling and classification, routing and prioritization, moderation and approvals, and summaries or recommendations that humans later correct. + +It is especially useful when the rollout path is gradual: + +- start in report-only mode +- move to a shadow or other safe write target +- use later corrections to improve the workflow +- promote to direct writes only when the evidence is strong enough + +## How It Works + +A clean CorrectionOps setup has two long-lived surfaces and one optional temporary one. Production stays authoritative. Ops is the long-lived home for prediction, correction intake, reporting, and instruction updates. Shadow, when used, is just a safe write target during evaluation. + +That means the workflows usually stay in ops. During evaluation they write to shadow. After promotion they can write directly to production without moving to a different repository. CorrectionOps is therefore broader than shadow evaluation. Shadow evaluation is one rollout shape inside CorrectionOps, not the whole pattern. + +Most implementations reduce to three workflow classes: a thin relay that forwards stable facts into ops, a prediction workflow that persists snapshots and writes safely, and a compare/report/decide workflow that checks later human truth and updates the system when the evidence is strong enough. + +The important rule is to keep relays, snapshot resolution, diffing, and grouping deterministic. Use the agent for semantic judgment, not for reconstructing event history or inferring provenance after the fact. + +```aw wrap +--- +on: + schedule: daily + workflow_dispatch: + repository_dispatch: + types: [truth-feedback] +permissions: + contents: read + issues: read +safe-outputs: + create-issue: + create-pull-request: +--- + +# CorrectionOps Worker + +Read persisted predictions and later trusted truth, compare them deterministically, then either publish a health report or open a draft PR updating instructions. +``` + +CorrectionOps solves a different problem than model training. Reinforcement Learning from Human Feedback (RLHF) updates model weights from human feedback. CorrectionOps updates the workflow system around the model. In practice that usually means changing instruction files, routing rules, deterministic checks, thresholds, or rollout decisions rather than trying to retrain the engine. + +In a healthy CorrectionOps loop, production truth stays authoritative, predictions are saved explicitly, corrections include provenance, and diffs are built deterministically before the agent is asked to reason about them. + +CorrectionOps does not require a shadow surface, but many teams start with one. The normal progression is report-only first, then shadow evaluation when a safe write target is needed, then direct production writes once the evidence is strong enough. + +## Implementing It With GitHub Actions + +GitHub Actions is a strong fit because the pattern is mostly orchestration, artifact passing, and controlled writes across repositories. In practice, production events create the initial signal, a thin relay forwards that signal into ops, and the ops repo runs prediction and comparison work on schedules, manual dispatch, or forwarded events. + +For most teams, the clearest starting point is three workflows: one thin relay in the source repo, one prediction workflow in ops, and one compare/report/decide workflow in ops. Split further only when the boundary is real, such as a different trigger, a different permission boundary, or a separate serialized write path. + +In `gh-aw`, keep orchestration in frontmatter and step sections, use a small trusted set of GitHub Actions for plumbing, and keep policy-critical normalization, diffing, and grouping in repo-local scripts. `actions/github-script`, checkout, and artifact upload/download are usually enough. + +```yaml title="prod/.github/workflows/relay-correction-signals.yml" +name: Relay Correction Signals + +on: + discussion: + types: [created, labeled, unlabeled] + +jobs: + relay: + runs-on: ubuntu-latest + steps: + - name: Forward stable facts to ops + uses: actions/github-script@v8 + with: + github-token: ${{ secrets.OPS_DISPATCH_TOKEN }} + script: | + await github.rest.repos.createDispatchEvent({ + owner: 'org', + repo: 'ops-repo', + event_type: context.payload.action === 'created' ? 'item-created' : 'truth-feedback', + client_payload: { + data: { + source_repository: `${context.repo.owner}/${context.repo.repo}`, + item_number: context.payload.discussion.number, + label: context.payload.label?.name || null, + actor: context.actor, + actor_type: context.actor.endsWith('[bot]') ? 'bot' : 'human', + }, + }, + }); +``` + +Most CorrectionOps systems still need both scheduled and manual entry points. A scheduled run catches drift and stale backlog. `workflow_dispatch` makes it possible to backfill one item, rerun one parent correction issue, or test a new instruction revision safely. Artifact handoff is often simpler than re-fetching everything in every step, and checkout should usually stay in ops rather than in production relays. + +## Portable Starter Architecture + +CorrectionOps is implementable for almost any repository that has three ingredients: + +1. a production object to observe, such as issues, pull requests, discussions, labels, approvals, or comments +2. a later human action that counts as trustworthy truth +3. an operational surface, usually an ops repo, where instructions and reports can live + +The minimal reusable architecture is: + +- one production relay workflow +- one ops prediction workflow +- one ops compare, report, and decide workflow +- one stable snapshot schema + +Many teams add a separate correction-collector workflow because the truth-ingest boundary is naturally deterministic and often triggered by `repository_dispatch`. That is a useful operational split, but it is not the simplest shape to teach first. + +The repository-specific work is usually limited to how to fetch and normalize the production object, which human actions count as trusted truth, what grouped correction patterns are meaningful, and which instruction or policy files are allowed to change. That is what keeps the pattern portable across different business domains. + +## Concrete Example: Discussion Labelling + +Discussion labelling is one concrete CorrectionOps implementation. + +- Production hosts the real discussions and later human truth. +- Ops runs the long-lived workflows. +- Shadow, when used, receives safe evaluation writes before production writes are enabled. + +In that example, the shape is: + +- production or shadow surface: thin relay workflows only +- ops repo: the real control loop + +The concrete workflow layout looks like this. + +### Production or Shadow Surface + +- `community-discussion-mirror.yml`: copies production discussions into the shadow surface when a live-like write target is needed +- `label-feedback-dispatch.yml`: forwards stable discussion facts and later trusted label truth into ops + +### Central Ops Repo + +- `auto-labelling.md`: prediction workflow that reads prepared discussion inputs, applies safe outputs, and persists prediction snapshots +- `labelling-correction-collector.yml`: deterministic correction-intake workflow that resolves current source-of-truth state and stores correction evidence +- `discussion-labelling-ops.md`: combined compare, report, and decide workflow that either publishes health summaries or opens a draft PR updating instructions + +So the simple reusable pattern is still three workflow classes, but a real multi-repo example often has five workflow files once the thin mirror and relay workflows are counted. + +### Example Workflow Roles + +| Workflow | Role | +| --- | --- | +| `community-discussion-mirror.yml` | Optional shadow support | +| `label-feedback-dispatch.yml` | Thin truth relay | +| `auto-labelling.md` | Predict and persist | +| `labelling-correction-collector.yml` | Deterministic truth intake | +| `discussion-labelling-ops.md` | Compare, report, and decide | + +This example is already close to the elegant target: one thin relay workflow in the source or shadow surface, one prediction workflow in ops, and one compare/report/decide workflow in ops. The part that still tends to feel mechanical is the deterministic helper layer behind those workflows, not the repo split itself. + +This general shape also applies to routing, moderation, prioritization, approvals, summaries, and other decisions where later human actions provide trustworthy operational truth. + +## Relationship To Other Patterns + +CorrectionOps overlaps with several adjacent ideas, but it solves a narrower problem. + +- Shadow deployment evaluates a candidate safely on live traffic. CorrectionOps adds the correction-driven adaptation loop. +- Human-in-the-loop review adds oversight at decision time. CorrectionOps adds a durable memory of corrections and uses it to change the workflow later. +- LLMOps and AgentOps provide broader tracing, evaluation, and governance capabilities. CorrectionOps is a specific design pattern for using trusted corrections to improve production-adjacent workflows. +- RLHF updates model weights from human preference data. CorrectionOps updates the operational system around the model instead. + +## Related Documentation + +- [Safe Rollout](/gh-aw/organization-practices/safe-rollout/) for the optional safe-write rollout guidance inside CorrectionOps +- [SideRepoOps](/gh-aw/patterns/side-repo-ops/) for separating workflow infrastructure from the production repository +- [MultiRepoOps](/gh-aw/patterns/multi-repo-ops/) for coordinating workflows across repository boundaries +- [Safe Outputs Reference](/gh-aw/reference/safe-outputs/) for controlling write targets and protections +- [GitHub Tools](/gh-aw/reference/github-tools/) for cross-repository reads and operations diff --git a/docs/src/content/docs/reference/glossary.md b/docs/src/content/docs/reference/glossary.md index 8ec031f2c26..4e6aab3b56b 100644 --- a/docs/src/content/docs/reference/glossary.md +++ b/docs/src/content/docs/reference/glossary.md @@ -624,6 +624,22 @@ AI-powered GitHub Projects board management automating issue triage, routing, an Development pattern where workflows run from a separate "side" repository targeting your main codebase. Keeps AI-generated issues, comments, and workflow runs isolated from the main repository for cleaner separation between automation infrastructure and production code. See [SideRepoOps](/gh-aw/patterns/side-repo-ops/). +### CorrectionOps + +Pattern for improving production-adjacent workflows from trusted human corrections without retraining the underlying model. CorrectionOps persists predictions, compares them with later human truth, and uses deterministic correction evidence to update instructions, routing, thresholds, and rollout policy. See [CorrectionOps](/gh-aw/patterns/correction-ops/). + +### Organization Practices + +Documentation category covering rollout, sharing, ownership, and governance concerns for workflows across teams and repositories. These topics matter at organization scale, but they are not always best described as standalone design patterns. See [Organization Practices](/gh-aw/organization-practices/). + +### Safe Rollout + +Rollout practice for moving from report-only or staged behavior to direct production writes with evidence and control. Safe rollout can include techniques such as shadow evaluation, where a workflow runs on live production signals but writes to a non-production target while trust is being built. See [Safe Rollout](/gh-aw/organization-practices/safe-rollout/). + +### Shadow Evaluation + +One safe-rollout technique where a workflow runs on live production signals without writing directly to the production system. Shadow evaluation usually keeps production as the source of truth, uses an ops repository to measure prediction-versus-truth deltas, and may introduce a temporary mirror surface during trust-building. See [Safe Rollout](/gh-aw/organization-practices/safe-rollout/). + ### SpecOps Maintaining and propagating W3C-style specifications using the `w3c-specification-writer` agent. Creates formal specifications with RFC 2119 keywords and automatically synchronizes changes to consuming implementations. See [SpecOps](/gh-aw/patterns/spec-ops/). From 8c7e0c1d7432c251ab2c36daf2f22c5aeea45f01 Mon Sep 17 00:00:00 2001 From: Mara Nikola Kiefer Date: Mon, 27 Apr 2026 10:06:40 +0200 Subject: [PATCH 02/11] Update CorrectionOps --- .../content/docs/patterns/correction-ops.md | 163 +++++++++++++++--- 1 file changed, 136 insertions(+), 27 deletions(-) diff --git a/docs/src/content/docs/patterns/correction-ops.md b/docs/src/content/docs/patterns/correction-ops.md index caa1b22641d..9658b4f7e7d 100644 --- a/docs/src/content/docs/patterns/correction-ops.md +++ b/docs/src/content/docs/patterns/correction-ops.md @@ -127,47 +127,156 @@ Many teams add a separate correction-collector workflow because the truth-ingest The repository-specific work is usually limited to how to fetch and normalize the production object, which human actions count as trusted truth, what grouped correction patterns are meaningful, and which instruction or policy files are allowed to change. That is what keeps the pattern portable across different business domains. -## Concrete Example: Discussion Labelling +## Reproducible Starter Setup -Discussion labelling is one concrete CorrectionOps implementation. +This page intentionally uses generic repository and workflow names so the pattern can be reproduced without depending on any partner repository. -- Production hosts the real discussions and later human truth. -- Ops runs the long-lived workflows. -- Shadow, when used, receives safe evaluation writes before production writes are enabled. +The simplest teachable setup uses two repositories and an optional third: -In that example, the shape is: +- `prod-repo`: the authoritative system where the original object and later human truth live +- `ops-repo`: the long-lived control plane for prediction, correction review, reporting, and instruction updates +- `shadow-repo`: an optional safe write target used only during rollout -- production or shadow surface: thin relay workflows only -- ops repo: the real control loop +The workflow layout is: -The concrete workflow layout looks like this. +| Repository | Workflow | Role | +| --- | --- | --- | +| `prod-repo` | `relay-correction-signals.yml` | Thin deterministic relay | +| `ops-repo` | `predict-items.md` | Predict and persist snapshots | +| `ops-repo` | `review-corrections.md` | Compare, report, and decide | +| `ops-repo` | `collect-corrections.yml` | Optional deterministic truth intake | +| `shadow-repo` | `mirror-items.yml` | Optional safe-write support | -### Production or Shadow Surface +If the source event stream already contains everything needed for later comparison, skip `collect-corrections.yml`. If direct writes are too risky during rollout, add `mirror-items.yml` and point safe outputs at `shadow-repo` until the evidence is strong enough. -- `community-discussion-mirror.yml`: copies production discussions into the shadow surface when a live-like write target is needed -- `label-feedback-dispatch.yml`: forwards stable discussion facts and later trusted label truth into ops +### 1. Thin Relay In The Source Repo -### Central Ops Repo +The relay only forwards stable facts and provenance into ops. It should not compute diffs, infer human intent, or decide whether the workflow was correct. -- `auto-labelling.md`: prediction workflow that reads prepared discussion inputs, applies safe outputs, and persists prediction snapshots -- `labelling-correction-collector.yml`: deterministic correction-intake workflow that resolves current source-of-truth state and stores correction evidence -- `discussion-labelling-ops.md`: combined compare, report, and decide workflow that either publishes health summaries or opens a draft PR updating instructions +```yaml title="prod-repo/.github/workflows/relay-correction-signals.yml" +name: Relay Correction Signals + +on: + issues: + types: [opened, labeled, unlabeled] + +jobs: + relay: + runs-on: ubuntu-latest + steps: + - name: Forward stable facts to ops + uses: actions/github-script@v8 + with: + github-token: ${{ secrets.OPS_DISPATCH_TOKEN }} + script: | + await github.rest.repos.createDispatchEvent({ + owner: 'org', + repo: 'ops-repo', + event_type: context.payload.action === 'opened' ? 'item-created' : 'truth-feedback', + client_payload: { + data: { + source_repository: `${context.repo.owner}/${context.repo.repo}`, + source_type: 'issue', + item_number: context.payload.issue.number, + item_title: context.payload.issue.title, + item_url: context.payload.issue.html_url, + event_type: context.payload.action, + label: context.payload.label?.name || null, + actor: context.actor, + actor_type: context.actor.endsWith('[bot]') ? 'bot' : 'human', + occurred_at: new Date().toISOString(), + }, + }, + }); +``` + +### 2. Prediction Workflow In Ops + +The prediction workflow consumes normalized inputs, applies the current instructions, writes through safe outputs, and persists a durable snapshot that can be compared later. + +```aw wrap title="ops-repo/.github/workflows/predict-items.md" +--- +name: Predict Items + +on: + schedule: daily + workflow_dispatch: + repository_dispatch: + types: [item-created] -So the simple reusable pattern is still three workflow classes, but a real multi-repo example often has five workflow files once the thin mirror and relay workflows are counted. +tools: + github: + toolsets: [issues, repos] + +safe-outputs: + create-issue: + update-issue: + target-repo: ${{ inputs.target-repo || 'shadow-repo' }} +--- + +# Predict Items + +Read prepared items from `/tmp/gh-aw/agent/item-scan`, apply the current instructions, write the proposed changes through safe outputs, and append a prediction snapshot containing the source identifier, predicted action, instruction version, and timestamp. +``` + +### 3. Compare, Report, And Decide In Ops + +The review workflow reads persisted predictions and later human truth, builds deterministic diffs first, and only then asks the agent to summarize patterns or propose instruction updates. + +```aw wrap title="ops-repo/.github/workflows/review-corrections.md" +--- +name: Review Corrections + +on: + schedule: weekly + workflow_dispatch: + inputs: + mode: + description: report or adaptation + required: false + default: report + type: choice + options: [report, adaptation] + +safe-outputs: + create-issue: + create-pull-request: +--- + +# Review Corrections + +Read `correction-diffs.json` from `/tmp/gh-aw/agent/correction-review`. In `report` mode, publish a health summary. In `adaptation` mode, open a draft PR updating the instruction file only when the grouped evidence is strong enough. +``` + +### 4. Optional Deterministic Collector + +Add a separate collector only when the later-truth boundary deserves its own trigger, permissions, or serialized write path. + +```yaml title="ops-repo/.github/workflows/collect-corrections.yml" +name: Collect Corrections + +on: + repository_dispatch: + types: [truth-feedback] + +jobs: + collect: + runs-on: ubuntu-latest + steps: + - name: Resolve authoritative truth and store correction evidence + run: ./scripts/store-correction-evidence.sh +``` -### Example Workflow Roles +### 5. Stable Contracts To Define First -| Workflow | Role | -| --- | --- | -| `community-discussion-mirror.yml` | Optional shadow support | -| `label-feedback-dispatch.yml` | Thin truth relay | -| `auto-labelling.md` | Predict and persist | -| `labelling-correction-collector.yml` | Deterministic truth intake | -| `discussion-labelling-ops.md` | Compare, report, and decide | +Before adding rollout logic or adaptation prompts, define four small deterministic contracts: -This example is already close to the elegant target: one thin relay workflow in the source or shadow surface, one prediction workflow in ops, and one compare/report/decide workflow in ops. The part that still tends to feel mechanical is the deterministic helper layer behind those workflows, not the repo split itself. +1. relay payload: the minimal source identity, object identity, event type, actor facts, and timestamps forwarded into ops +2. prediction snapshot: the durable record of what the workflow predicted and under which instruction version +3. correction review input: the deterministic diff artifact used by reporting and adaptation +4. write target contract: which repository receives evaluation writes before direct production writes are enabled -This general shape also applies to routing, moderation, prioritization, approvals, summaries, and other decisions where later human actions provide trustworthy operational truth. +Discussion labelling, routing, moderation, prioritization, approvals, and summaries can all reuse this shape. The production object changes, but the CorrectionOps setup does not. ## Relationship To Other Patterns From 12a29e97ba2b3e4dcca5649e54540077530efbc1 Mon Sep 17 00:00:00 2001 From: Mara Nikola Kiefer <8320933+mnkiefer@users.noreply.github.com> Date: Mon, 27 Apr 2026 10:10:12 +0200 Subject: [PATCH 03/11] Delete diagram --- .../images/shadow-ops-overview.drawio.svg | 241 ------------------ 1 file changed, 241 deletions(-) delete mode 100644 docs/public/images/shadow-ops-overview.drawio.svg diff --git a/docs/public/images/shadow-ops-overview.drawio.svg b/docs/public/images/shadow-ops-overview.drawio.svg deleted file mode 100644 index 179a3be9410..00000000000 --- a/docs/public/images/shadow-ops-overview.drawio.svg +++ /dev/null @@ -1,241 +0,0 @@ - - - - - - - - - - - -
-
-
- - live events - -
-
-
-
- - live events - -
-
-
- - - - - - - -
-
-
- - Production - -
-
-
-
- - Production - -
-
-
- - - - - - - - -
-
-
- - Ops - -
-
-
-
- - Ops - -
-
-
- - - - - - - - - - - - - - - -
-
-
- - Shadow - -
-
-
-
- - Shadow - -
-
-
- - - - - - - - - - - - - - - - - - - - - - -
-
-
- - writes - -
-
-
-
- - writes - -
-
-
- - - - - - - - -
-
-
- - workflow tuning, -
- reports -
-
-
-
-
- - workflow tuning,... - -
-
-
- - - - - - - - - - - - -
-
-
- - inject bad and rare -
- valid cases -
-
-
-
-
- - inject bad and rare... - -
-
-
- - - - - - - - - - - - - - -
-
-
- - writes here -
- when going live -
-
-
-
-
- - writes here... - -
-
-
-
- - - - - Text is not SVG - cannot display - - - -
\ No newline at end of file From 13eee4468075bc4f54893b49efb759636d471bc2 Mon Sep 17 00:00:00 2001 From: Mara Nikola Kiefer <8320933+mnkiefer@users.noreply.github.com> Date: Mon, 27 Apr 2026 11:14:46 +0200 Subject: [PATCH 04/11] Update docs/src/content/docs/reference/glossary.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- docs/src/content/docs/reference/glossary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/content/docs/reference/glossary.md b/docs/src/content/docs/reference/glossary.md index 082adac12b9..10f83410aaf 100644 --- a/docs/src/content/docs/reference/glossary.md +++ b/docs/src/content/docs/reference/glossary.md @@ -670,7 +670,7 @@ Rollout practice for moving from report-only or staged behavior to direct produc ### Shadow Evaluation -One safe-rollout technique where a workflow runs on live production signals without writing directly to the production system. Shadow evaluation usually keeps production as the source of truth, uses an ops repository to measure prediction-versus-truth deltas, and may introduce a temporary mirror surface during trust-building. See [Safe Rollout](/gh-aw/organization-practices/safe-rollout/). +One safe-rollout technique where a workflow runs on live production signals without writing directly to the production system. Shadow evaluation usually keeps production as the source of truth, uses an ops repository to measure prediction-versus-truth deltas, and may introduce a temporary mirror surface during trust-building. See [Safe Rollout](/gh-aw/organization-practices/safe-rollout/#when-shadow-evaluation-is-needed). ### SpecOps From 4b54acc71130073c60f576e06d2256f60cba6ce8 Mon Sep 17 00:00:00 2001 From: Mara Nikola Kiefer <8320933+mnkiefer@users.noreply.github.com> Date: Mon, 27 Apr 2026 11:15:34 +0200 Subject: [PATCH 05/11] Update docs/src/content/docs/patterns/correction-ops.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- docs/src/content/docs/patterns/correction-ops.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/src/content/docs/patterns/correction-ops.md b/docs/src/content/docs/patterns/correction-ops.md index 9658b4f7e7d..baf3279960d 100644 --- a/docs/src/content/docs/patterns/correction-ops.md +++ b/docs/src/content/docs/patterns/correction-ops.md @@ -1,6 +1,8 @@ --- title: CorrectionOps description: Improve agentic workflows from trusted human corrections without retraining the underlying model +sidebar: + badge: Pattern --- CorrectionOps is a workflow pattern that compares predictions with later human corrections. From 69a9b0c1bbe941cf3ada97281fc8fc301f9ea952 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 27 Apr 2026 10:51:36 +0000 Subject: [PATCH 06/11] Initial plan From 6eb6c97e906a1fd94adc5119cf6b37b40e262e02 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 27 Apr 2026 10:58:57 +0000 Subject: [PATCH 07/11] docs: Remove CorrectionOps from PR (will be in separate PR) Agent-Logs-Url: https://github.com/github/gh-aw/sessions/cba9341e-916d-4a8b-afb8-3f995d5d7509 Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com> --- docs/astro.config.mjs | 1 - .../docs/organization-practices/index.mdx | 2 +- .../organization-practices/safe-rollout.md | 5 +- .../content/docs/patterns/correction-ops.md | 298 ------------------ docs/src/content/docs/reference/glossary.md | 4 - 5 files changed, 2 insertions(+), 308 deletions(-) delete mode 100644 docs/src/content/docs/patterns/correction-ops.md diff --git a/docs/astro.config.mjs b/docs/astro.config.mjs index b2c635ba39c..9c3fdff42d9 100644 --- a/docs/astro.config.mjs +++ b/docs/astro.config.mjs @@ -296,7 +296,6 @@ export default defineConfig({ { label: 'BatchOps', link: '/patterns/batch-ops/' }, { label: 'CentralRepoOps', link: '/patterns/central-repo-ops/' }, { label: 'ChatOps', link: '/patterns/chat-ops/' }, - { label: 'CorrectionOps', link: '/patterns/correction-ops/' }, { label: 'DailyOps', link: '/patterns/daily-ops/' }, { label: 'DataOps', link: '/patterns/data-ops/' }, { label: 'DispatchOps', link: '/patterns/dispatch-ops/' }, diff --git a/docs/src/content/docs/organization-practices/index.mdx b/docs/src/content/docs/organization-practices/index.mdx index 0eb91f42e2d..2f63f6eda85 100644 --- a/docs/src/content/docs/organization-practices/index.mdx +++ b/docs/src/content/docs/organization-practices/index.mdx @@ -5,7 +5,7 @@ description: Guidance for adopting, sharing, and governing agentic workflows acr Organization Practices collects guidance that matters at team and enterprise scale but does not need to be presented as a standalone design pattern. -Patterns describe durable workflow shapes such as [CorrectionOps](/gh-aw/patterns/correction-ops/) or [MultiRepoOps](/gh-aw/patterns/multi-repo-ops/). Organization practices cover how those patterns are rolled out, shared, and governed across repositories and teams. +Patterns describe durable workflow shapes such as [MultiRepoOps](/gh-aw/patterns/multi-repo-ops/). Organization practices cover how those patterns are rolled out, shared, and governed across repositories and teams. This section is the right place for topics such as: diff --git a/docs/src/content/docs/organization-practices/safe-rollout.md b/docs/src/content/docs/organization-practices/safe-rollout.md index d2c5476ea75..2d5d80642a7 100644 --- a/docs/src/content/docs/organization-practices/safe-rollout.md +++ b/docs/src/content/docs/organization-practices/safe-rollout.md @@ -9,8 +9,6 @@ Safe rollout is the practice of increasing workflow autonomy in steps instead of The main question is not whether a workflow is useful, but whether it is trusted enough to act on the live system. In practice, teams usually move through a ladder: report-only first, then staged behavior, then a more realistic safe-write technique if needed, and finally direct production writes. -This is especially useful for [CorrectionOps](/gh-aw/patterns/correction-ops/), where the goal is to improve the workflow over time using persisted predictions and later human truth. - ## Rollout Ladder The usual progression is: @@ -67,11 +65,10 @@ The common repository split is: - ops repository: persists predictions, collects corrections, publishes reports, and updates instructions - shadow repository: temporary non-production write target during rollout -That shape is often useful, but it is still rollout guidance rather than a primary pattern. The stronger reusable pattern remains [CorrectionOps](/gh-aw/patterns/correction-ops/). +That shape is often useful, but it is still rollout guidance rather than a primary pattern. ## Related Documentation -- [CorrectionOps](/gh-aw/patterns/correction-ops/) - [SideRepoOps](/gh-aw/patterns/side-repo-ops/) - [MultiRepoOps](/gh-aw/patterns/multi-repo-ops/) - [Staged Mode](/gh-aw/reference/staged-mode/) diff --git a/docs/src/content/docs/patterns/correction-ops.md b/docs/src/content/docs/patterns/correction-ops.md deleted file mode 100644 index baf3279960d..00000000000 --- a/docs/src/content/docs/patterns/correction-ops.md +++ /dev/null @@ -1,298 +0,0 @@ ---- -title: CorrectionOps -description: Improve agentic workflows from trusted human corrections without retraining the underlying model -sidebar: - badge: Pattern ---- - -CorrectionOps is a workflow pattern that compares predictions with later human corrections. - -Instead of retraining the model, CorrectionOps improves the workflow around the model. It stores predictions at decision time, compares them with later trusted human truth, and uses that evidence to update instructions, routing, thresholds, and rollout decisions. - -The basic loop is simple: - -1. Save what the workflow predicted -2. Collect what humans later decided -3. Use the difference to improve the workflow - -Discussion labelling is a good example: a workflow applies labels, humans later correct those labels, and the system uses that correction evidence to improve future runs. - -## When to Use CorrectionOps - -Use CorrectionOps when you want to turn a human decision process into an agentic workflow iteratively rather than all at once. - -It is a good fit when humans still make or correct the real decision, but you want the workflow to improve over time by updating instructions, routing, thresholds, or rollout state. - -Typical fits include labelling and classification, routing and prioritization, moderation and approvals, and summaries or recommendations that humans later correct. - -It is especially useful when the rollout path is gradual: - -- start in report-only mode -- move to a shadow or other safe write target -- use later corrections to improve the workflow -- promote to direct writes only when the evidence is strong enough - -## How It Works - -A clean CorrectionOps setup has two long-lived surfaces and one optional temporary one. Production stays authoritative. Ops is the long-lived home for prediction, correction intake, reporting, and instruction updates. Shadow, when used, is just a safe write target during evaluation. - -That means the workflows usually stay in ops. During evaluation they write to shadow. After promotion they can write directly to production without moving to a different repository. CorrectionOps is therefore broader than shadow evaluation. Shadow evaluation is one rollout shape inside CorrectionOps, not the whole pattern. - -Most implementations reduce to three workflow classes: a thin relay that forwards stable facts into ops, a prediction workflow that persists snapshots and writes safely, and a compare/report/decide workflow that checks later human truth and updates the system when the evidence is strong enough. - -The important rule is to keep relays, snapshot resolution, diffing, and grouping deterministic. Use the agent for semantic judgment, not for reconstructing event history or inferring provenance after the fact. - -```aw wrap ---- -on: - schedule: daily - workflow_dispatch: - repository_dispatch: - types: [truth-feedback] -permissions: - contents: read - issues: read -safe-outputs: - create-issue: - create-pull-request: ---- - -# CorrectionOps Worker - -Read persisted predictions and later trusted truth, compare them deterministically, then either publish a health report or open a draft PR updating instructions. -``` - -CorrectionOps solves a different problem than model training. Reinforcement Learning from Human Feedback (RLHF) updates model weights from human feedback. CorrectionOps updates the workflow system around the model. In practice that usually means changing instruction files, routing rules, deterministic checks, thresholds, or rollout decisions rather than trying to retrain the engine. - -In a healthy CorrectionOps loop, production truth stays authoritative, predictions are saved explicitly, corrections include provenance, and diffs are built deterministically before the agent is asked to reason about them. - -CorrectionOps does not require a shadow surface, but many teams start with one. The normal progression is report-only first, then shadow evaluation when a safe write target is needed, then direct production writes once the evidence is strong enough. - -## Implementing It With GitHub Actions - -GitHub Actions is a strong fit because the pattern is mostly orchestration, artifact passing, and controlled writes across repositories. In practice, production events create the initial signal, a thin relay forwards that signal into ops, and the ops repo runs prediction and comparison work on schedules, manual dispatch, or forwarded events. - -For most teams, the clearest starting point is three workflows: one thin relay in the source repo, one prediction workflow in ops, and one compare/report/decide workflow in ops. Split further only when the boundary is real, such as a different trigger, a different permission boundary, or a separate serialized write path. - -In `gh-aw`, keep orchestration in frontmatter and step sections, use a small trusted set of GitHub Actions for plumbing, and keep policy-critical normalization, diffing, and grouping in repo-local scripts. `actions/github-script`, checkout, and artifact upload/download are usually enough. - -```yaml title="prod/.github/workflows/relay-correction-signals.yml" -name: Relay Correction Signals - -on: - discussion: - types: [created, labeled, unlabeled] - -jobs: - relay: - runs-on: ubuntu-latest - steps: - - name: Forward stable facts to ops - uses: actions/github-script@v8 - with: - github-token: ${{ secrets.OPS_DISPATCH_TOKEN }} - script: | - await github.rest.repos.createDispatchEvent({ - owner: 'org', - repo: 'ops-repo', - event_type: context.payload.action === 'created' ? 'item-created' : 'truth-feedback', - client_payload: { - data: { - source_repository: `${context.repo.owner}/${context.repo.repo}`, - item_number: context.payload.discussion.number, - label: context.payload.label?.name || null, - actor: context.actor, - actor_type: context.actor.endsWith('[bot]') ? 'bot' : 'human', - }, - }, - }); -``` - -Most CorrectionOps systems still need both scheduled and manual entry points. A scheduled run catches drift and stale backlog. `workflow_dispatch` makes it possible to backfill one item, rerun one parent correction issue, or test a new instruction revision safely. Artifact handoff is often simpler than re-fetching everything in every step, and checkout should usually stay in ops rather than in production relays. - -## Portable Starter Architecture - -CorrectionOps is implementable for almost any repository that has three ingredients: - -1. a production object to observe, such as issues, pull requests, discussions, labels, approvals, or comments -2. a later human action that counts as trustworthy truth -3. an operational surface, usually an ops repo, where instructions and reports can live - -The minimal reusable architecture is: - -- one production relay workflow -- one ops prediction workflow -- one ops compare, report, and decide workflow -- one stable snapshot schema - -Many teams add a separate correction-collector workflow because the truth-ingest boundary is naturally deterministic and often triggered by `repository_dispatch`. That is a useful operational split, but it is not the simplest shape to teach first. - -The repository-specific work is usually limited to how to fetch and normalize the production object, which human actions count as trusted truth, what grouped correction patterns are meaningful, and which instruction or policy files are allowed to change. That is what keeps the pattern portable across different business domains. - -## Reproducible Starter Setup - -This page intentionally uses generic repository and workflow names so the pattern can be reproduced without depending on any partner repository. - -The simplest teachable setup uses two repositories and an optional third: - -- `prod-repo`: the authoritative system where the original object and later human truth live -- `ops-repo`: the long-lived control plane for prediction, correction review, reporting, and instruction updates -- `shadow-repo`: an optional safe write target used only during rollout - -The workflow layout is: - -| Repository | Workflow | Role | -| --- | --- | --- | -| `prod-repo` | `relay-correction-signals.yml` | Thin deterministic relay | -| `ops-repo` | `predict-items.md` | Predict and persist snapshots | -| `ops-repo` | `review-corrections.md` | Compare, report, and decide | -| `ops-repo` | `collect-corrections.yml` | Optional deterministic truth intake | -| `shadow-repo` | `mirror-items.yml` | Optional safe-write support | - -If the source event stream already contains everything needed for later comparison, skip `collect-corrections.yml`. If direct writes are too risky during rollout, add `mirror-items.yml` and point safe outputs at `shadow-repo` until the evidence is strong enough. - -### 1. Thin Relay In The Source Repo - -The relay only forwards stable facts and provenance into ops. It should not compute diffs, infer human intent, or decide whether the workflow was correct. - -```yaml title="prod-repo/.github/workflows/relay-correction-signals.yml" -name: Relay Correction Signals - -on: - issues: - types: [opened, labeled, unlabeled] - -jobs: - relay: - runs-on: ubuntu-latest - steps: - - name: Forward stable facts to ops - uses: actions/github-script@v8 - with: - github-token: ${{ secrets.OPS_DISPATCH_TOKEN }} - script: | - await github.rest.repos.createDispatchEvent({ - owner: 'org', - repo: 'ops-repo', - event_type: context.payload.action === 'opened' ? 'item-created' : 'truth-feedback', - client_payload: { - data: { - source_repository: `${context.repo.owner}/${context.repo.repo}`, - source_type: 'issue', - item_number: context.payload.issue.number, - item_title: context.payload.issue.title, - item_url: context.payload.issue.html_url, - event_type: context.payload.action, - label: context.payload.label?.name || null, - actor: context.actor, - actor_type: context.actor.endsWith('[bot]') ? 'bot' : 'human', - occurred_at: new Date().toISOString(), - }, - }, - }); -``` - -### 2. Prediction Workflow In Ops - -The prediction workflow consumes normalized inputs, applies the current instructions, writes through safe outputs, and persists a durable snapshot that can be compared later. - -```aw wrap title="ops-repo/.github/workflows/predict-items.md" ---- -name: Predict Items - -on: - schedule: daily - workflow_dispatch: - repository_dispatch: - types: [item-created] - -tools: - github: - toolsets: [issues, repos] - -safe-outputs: - create-issue: - update-issue: - target-repo: ${{ inputs.target-repo || 'shadow-repo' }} ---- - -# Predict Items - -Read prepared items from `/tmp/gh-aw/agent/item-scan`, apply the current instructions, write the proposed changes through safe outputs, and append a prediction snapshot containing the source identifier, predicted action, instruction version, and timestamp. -``` - -### 3. Compare, Report, And Decide In Ops - -The review workflow reads persisted predictions and later human truth, builds deterministic diffs first, and only then asks the agent to summarize patterns or propose instruction updates. - -```aw wrap title="ops-repo/.github/workflows/review-corrections.md" ---- -name: Review Corrections - -on: - schedule: weekly - workflow_dispatch: - inputs: - mode: - description: report or adaptation - required: false - default: report - type: choice - options: [report, adaptation] - -safe-outputs: - create-issue: - create-pull-request: ---- - -# Review Corrections - -Read `correction-diffs.json` from `/tmp/gh-aw/agent/correction-review`. In `report` mode, publish a health summary. In `adaptation` mode, open a draft PR updating the instruction file only when the grouped evidence is strong enough. -``` - -### 4. Optional Deterministic Collector - -Add a separate collector only when the later-truth boundary deserves its own trigger, permissions, or serialized write path. - -```yaml title="ops-repo/.github/workflows/collect-corrections.yml" -name: Collect Corrections - -on: - repository_dispatch: - types: [truth-feedback] - -jobs: - collect: - runs-on: ubuntu-latest - steps: - - name: Resolve authoritative truth and store correction evidence - run: ./scripts/store-correction-evidence.sh -``` - -### 5. Stable Contracts To Define First - -Before adding rollout logic or adaptation prompts, define four small deterministic contracts: - -1. relay payload: the minimal source identity, object identity, event type, actor facts, and timestamps forwarded into ops -2. prediction snapshot: the durable record of what the workflow predicted and under which instruction version -3. correction review input: the deterministic diff artifact used by reporting and adaptation -4. write target contract: which repository receives evaluation writes before direct production writes are enabled - -Discussion labelling, routing, moderation, prioritization, approvals, and summaries can all reuse this shape. The production object changes, but the CorrectionOps setup does not. - -## Relationship To Other Patterns - -CorrectionOps overlaps with several adjacent ideas, but it solves a narrower problem. - -- Shadow deployment evaluates a candidate safely on live traffic. CorrectionOps adds the correction-driven adaptation loop. -- Human-in-the-loop review adds oversight at decision time. CorrectionOps adds a durable memory of corrections and uses it to change the workflow later. -- LLMOps and AgentOps provide broader tracing, evaluation, and governance capabilities. CorrectionOps is a specific design pattern for using trusted corrections to improve production-adjacent workflows. -- RLHF updates model weights from human preference data. CorrectionOps updates the operational system around the model instead. - -## Related Documentation - -- [Safe Rollout](/gh-aw/organization-practices/safe-rollout/) for the optional safe-write rollout guidance inside CorrectionOps -- [SideRepoOps](/gh-aw/patterns/side-repo-ops/) for separating workflow infrastructure from the production repository -- [MultiRepoOps](/gh-aw/patterns/multi-repo-ops/) for coordinating workflows across repository boundaries -- [Safe Outputs Reference](/gh-aw/reference/safe-outputs/) for controlling write targets and protections -- [GitHub Tools](/gh-aw/reference/github-tools/) for cross-repository reads and operations diff --git a/docs/src/content/docs/reference/glossary.md b/docs/src/content/docs/reference/glossary.md index 10f83410aaf..459ca093acb 100644 --- a/docs/src/content/docs/reference/glossary.md +++ b/docs/src/content/docs/reference/glossary.md @@ -656,10 +656,6 @@ AI-powered GitHub Projects board management automating issue triage, routing, an Development pattern where workflows run from a separate "side" repository targeting your main codebase. Keeps AI-generated issues, comments, and workflow runs isolated from the main repository for cleaner separation between automation infrastructure and production code. See [SideRepoOps](/gh-aw/patterns/side-repo-ops/). -### CorrectionOps - -Pattern for improving production-adjacent workflows from trusted human corrections without retraining the underlying model. CorrectionOps persists predictions, compares them with later human truth, and uses deterministic correction evidence to update instructions, routing, thresholds, and rollout policy. See [CorrectionOps](/gh-aw/patterns/correction-ops/). - ### Organization Practices Documentation category covering rollout, sharing, ownership, and governance concerns for workflows across teams and repositories. These topics matter at organization scale, but they are not always best described as standalone design patterns. See [Organization Practices](/gh-aw/organization-practices/). From 210ab9f9ffcf68b83b371571d506b1138a438c5e Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 27 Apr 2026 11:29:28 +0000 Subject: [PATCH 08/11] docs: Remove glossary entries; nest Organization Practices under Guides Agent-Logs-Url: https://github.com/github/gh-aw/sessions/f5f04195-ba3e-4f5d-934a-1b45834e11e6 Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com> --- docs/astro.config.mjs | 16 ++++++++-------- docs/src/content/docs/reference/glossary.md | 12 ------------ 2 files changed, 8 insertions(+), 20 deletions(-) diff --git a/docs/astro.config.mjs b/docs/astro.config.mjs index 9c3fdff42d9..696ea1ba0c4 100644 --- a/docs/astro.config.mjs +++ b/docs/astro.config.mjs @@ -280,14 +280,14 @@ export default defineConfig({ { label: 'Ephemerals', link: '/guides/ephemerals/' }, { label: 'Web Search', link: '/guides/web-search/' }, { label: 'Audit Reports', link: '/guides/audit-with-agents/' }, - ], - }, - { - label: 'Organization Practices', - items: [ - { label: 'Overview', link: '/organization-practices/' }, - { label: 'Safe Rollout', link: '/organization-practices/safe-rollout/' }, - { label: 'Sharing Workflows', link: '/organization-practices/sharing-workflows/' }, + { + label: 'Organization Practices', + items: [ + { label: 'Overview', link: '/organization-practices/' }, + { label: 'Safe Rollout', link: '/organization-practices/safe-rollout/' }, + { label: 'Sharing Workflows', link: '/organization-practices/sharing-workflows/' }, + ], + }, ], }, { diff --git a/docs/src/content/docs/reference/glossary.md b/docs/src/content/docs/reference/glossary.md index 459ca093acb..dd410c7d269 100644 --- a/docs/src/content/docs/reference/glossary.md +++ b/docs/src/content/docs/reference/glossary.md @@ -656,18 +656,6 @@ AI-powered GitHub Projects board management automating issue triage, routing, an Development pattern where workflows run from a separate "side" repository targeting your main codebase. Keeps AI-generated issues, comments, and workflow runs isolated from the main repository for cleaner separation between automation infrastructure and production code. See [SideRepoOps](/gh-aw/patterns/side-repo-ops/). -### Organization Practices - -Documentation category covering rollout, sharing, ownership, and governance concerns for workflows across teams and repositories. These topics matter at organization scale, but they are not always best described as standalone design patterns. See [Organization Practices](/gh-aw/organization-practices/). - -### Safe Rollout - -Rollout practice for moving from report-only or staged behavior to direct production writes with evidence and control. Safe rollout can include techniques such as shadow evaluation, where a workflow runs on live production signals but writes to a non-production target while trust is being built. See [Safe Rollout](/gh-aw/organization-practices/safe-rollout/). - -### Shadow Evaluation - -One safe-rollout technique where a workflow runs on live production signals without writing directly to the production system. Shadow evaluation usually keeps production as the source of truth, uses an ops repository to measure prediction-versus-truth deltas, and may introduce a temporary mirror surface during trust-building. See [Safe Rollout](/gh-aw/organization-practices/safe-rollout/#when-shadow-evaluation-is-needed). - ### SpecOps Maintaining and propagating W3C-style specifications using the `w3c-specification-writer` agent. Creates formal specifications with RFC 2119 keywords and automatically synchronizes changes to consuming implementations. See [SpecOps](/gh-aw/patterns/spec-ops/). From 1d8bf7e2c88f1d3ad339d4a91672bbc826b36c3b Mon Sep 17 00:00:00 2001 From: Mara Nikola Kiefer <8320933+mnkiefer@users.noreply.github.com> Date: Mon, 27 Apr 2026 14:01:59 +0200 Subject: [PATCH 09/11] Update index.mdx --- .../docs/organization-practices/index.mdx | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/docs/src/content/docs/organization-practices/index.mdx b/docs/src/content/docs/organization-practices/index.mdx index 2f63f6eda85..acf91990abc 100644 --- a/docs/src/content/docs/organization-practices/index.mdx +++ b/docs/src/content/docs/organization-practices/index.mdx @@ -3,16 +3,12 @@ title: Organization Practices description: Guidance for adopting, sharing, and governing agentic workflows across teams and repositories. --- -Organization Practices collects guidance that matters at team and enterprise scale but does not need to be presented as a standalone design pattern. +Organization Practices collects guidance that matters at team and enterprise scale such as: -Patterns describe durable workflow shapes such as [MultiRepoOps](/gh-aw/patterns/multi-repo-ops/). Organization practices cover how those patterns are rolled out, shared, and governed across repositories and teams. - -This section is the right place for topics such as: - -- safe rollout strategies before production writes are enabled -- workflow sharing across repositories and organizations -- centralized ownership models for workflow infrastructure -- platform conventions for versioning, review, and promotion +- Safe rollout strategies before production writes are enabled +- Workflow sharing across repositories and organizations +- Centralized ownership models for workflow infrastructure +- Platform conventions for versioning, review, and promotion ## Included Topics @@ -28,4 +24,4 @@ This section is the right place for topics such as: - Use [Design Patterns](/gh-aw/patterns/) to learn reusable workflow shapes. - Use [Guides](/gh-aw/guides/) for task-oriented instructions such as [Reusing Workflows](/gh-aw/guides/packaging-imports/). -- Use [Reference](/gh-aw/reference/) for exact configuration syntax and field behavior. \ No newline at end of file +- Use [Reference](/gh-aw/reference/) for exact configuration syntax and field behavior. From a56eef967af54649b138fe357ba84d389bf70182 Mon Sep 17 00:00:00 2001 From: Mara Nikola Kiefer <8320933+mnkiefer@users.noreply.github.com> Date: Mon, 27 Apr 2026 14:38:34 +0200 Subject: [PATCH 10/11] Update astro.config.mjs --- docs/astro.config.mjs | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/astro.config.mjs b/docs/astro.config.mjs index 696ea1ba0c4..465c9711d35 100644 --- a/docs/astro.config.mjs +++ b/docs/astro.config.mjs @@ -134,8 +134,6 @@ export default defineConfig({ '/patterns/shadowops/': '/gh-aw/organization-practices/safe-rollout/', '/patterns/trialops/': '/gh-aw/patterns/trial-ops/', '/patterns/workqueueops/': '/gh-aw/patterns/workqueue-ops/', - '/patterns/shadow-ops/': '/gh-aw/organization-practices/safe-rollout/', - '/organization-practices/shadow-evaluation/': '/gh-aw/organization-practices/safe-rollout/', }, integrations: [ sitemap(), From 0e3f27425380dce5f5ff1483a5197dc701db7f99 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 27 Apr 2026 20:24:02 +0000 Subject: [PATCH 11/11] docs: Rewrite sharing-workflows.md with 7-layer model and enterprise pattern Agent-Logs-Url: https://github.com/github/gh-aw/sessions/1a9f03ac-6dcf-491e-9f47-985bc95489a9 Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com> --- .../content/docs/guides/packaging-imports.mdx | 3 + .../sharing-workflows.md | 132 ++++++++++++++---- 2 files changed, 106 insertions(+), 29 deletions(-) diff --git a/docs/src/content/docs/guides/packaging-imports.mdx b/docs/src/content/docs/guides/packaging-imports.mdx index d63f1a66186..4430d18b337 100644 --- a/docs/src/content/docs/guides/packaging-imports.mdx +++ b/docs/src/content/docs/guides/packaging-imports.mdx @@ -7,6 +7,9 @@ sidebar: import { Tabs, TabItem } from '@astrojs/starlight/components'; +> [!NOTE] +> The workflow sharing and import model described here reflects the current state of GitHub Agentic Workflows. The recommended patterns, commands, and configuration options may change in future releases as the platform evolves. + ## Adding Workflows You can add any existing workflow you have access to from external repositories. diff --git a/docs/src/content/docs/organization-practices/sharing-workflows.md b/docs/src/content/docs/organization-practices/sharing-workflows.md index 4e79e9964d2..d5c72ac1b82 100644 --- a/docs/src/content/docs/organization-practices/sharing-workflows.md +++ b/docs/src/content/docs/organization-practices/sharing-workflows.md @@ -5,62 +5,136 @@ sidebar: badge: { text: 'Platform', variant: 'tip' } --- -Sharing workflows across repositories is an organization practice, not a single design pattern. +> [!NOTE] +> The enterprise sharing model described here reflects the current state of GitHub Agentic Workflows. The recommended patterns, commands, and configuration options may change in future releases as the platform evolves. -Some teams want a central repository that publishes common workflows. Others want reusable imports, shared components, or starter repositories that teams can adopt and customize. The right choice depends on how tightly the organization wants to synchronize behavior across repositories. +Sharing workflows across an organization involves several independent layers. Each layer can be adopted independently; teams do not need all of them at once. -## Common Sharing Models +The recommended enterprise pattern is to maintain one central `agentic-workflows` repository with versioned workflow templates and shared components. Consuming repositories then use `gh aw add` to install full workflows and `imports:` to pull in common modules. -### Shared source repository +## Sharing Layers -One repository publishes workflow sources and other repositories add them with `gh aw add` or track them via `source:` metadata. This is a good fit when a platform team owns the workflow and downstream repositories should receive updates over time. +### 1. Copy and install whole workflows -### Reusable imports and shared components +A repository can pull in a complete workflow from another repository: -Shared imports are useful when the organization wants common building blocks rather than one complete workflow. This works well for common MCP configuration, shared prompts, safety policy, or reusable workflow fragments. +```bash +gh aw add acme-org/agentic-workflows/ci-doctor@v1.2.0 +``` -### Starter repositories and templates +The `source:` field is automatically added to the installed workflow's frontmatter so the origin and version are tracked. Use `gh aw add-wizard` for interactive installation with guided prompts. Use `gh aw add` for scripted or CI-driven installation. -Starter repositories and workflow templates are useful when teams need a strong starting point but are expected to diverge. This model favors local ownership over synchronized updates. +See [Reusing Workflows](/gh-aw/guides/packaging-imports/) for the full command reference and options. -### Central ops repositories +### 2. Reusable workflow components -A central operations repository is useful when workflows need durable memory, reporting, review, or organization-wide coordination. In that model, individual product repositories often emit signals while the ops repository owns the long-lived automation loop. +Shared building blocks — tool configurations, MCP server definitions, safety policies, and prompt snippets — can be imported into any workflow: -## Choosing Between Them +```yaml +imports: + - acme-org/shared-workflows/shared/security-setup.md@v2.1.0 + - acme-org/shared-workflows/shared/mcp/tavily.md@v1.0.0 +``` -Use a shared source repository when consistency matters more than local autonomy. +Remote imports are cached under `.github/aw/imports/` by commit SHA after the first fetch. This enables reproducible offline compilation and avoids redundant downloads when multiple refs point to the same commit. -Use imports when the common unit is a capability or policy fragment. +See [Imports Reference](/gh-aw/reference/imports/) for path formats, merge semantics, and field-specific behavior. -Use templates when the organization wants fast adoption but expects teams to customize independently. +### 3. Parameterized templates -Use a central ops repository when workflows need shared history, review queues, reporting, or cross-repository orchestration. +Shared workflows that declare an `import-schema` accept runtime parameters via `uses`/`with`: -## Governance Questions +```yaml +imports: + - uses: acme-org/shared-workflows/shared/reviewer.md@v1 + with: + languages: ["go", "typescript"] + severity: "high" +``` -When workflows are shared across an organization, the important questions are usually operational rather than technical: +This lets a single shared component serve multiple consuming workflows with different configurations without requiring separate copies. -- who owns the source workflow -- how updates are reviewed and promoted -- which repositories may consume or dispatch to shared workflows -- how secrets, permissions, and safe outputs are standardized -- when teams may fork a workflow rather than stay on the shared source +See [Imports Reference](/gh-aw/reference/imports/#parameterized-imports-useswith) for schema declaration and validation details. -Those decisions affect reliability more than the file format does. +### 4. Versioning and update flow + +Enterprise workflow sharing needs a clear versioning model: + +- **Semantic versions** (`@v1.2.0`) pin to a stable release and update to the latest compatible patch or minor within the same major via `gh aw update`. +- **Branch refs** (`@develop`) track the latest commit on a branch — useful for development integration. +- **SHA pins** (`@abc123def`) provide strict reproducibility and never move without an explicit change. + +To pull upstream changes into an already-installed workflow: + +```bash +gh aw update ci-doctor # update one workflow +gh aw update # update all tracked workflows +``` + +Updates use a 3-way merge by default to preserve local edits. Use `--no-merge` to replace the local copy with the upstream version without merging. Semantic version updates stay within the same major version unless `--major` is passed. + +### 5. Private and internal sharing controls + +Not all workflows are safe to share across organizations. GitHub Agentic Workflows provides controls at multiple levels: + +- **`private: true`** in frontmatter blocks a workflow from being installed into other repositories via `gh aw add`. Attempting to add a private workflow from another repository fails with an error. +- **Repository visibility** controls which workflows are discoverable. Private repositories require access before any workflow can be fetched. +- **Org-internal catalogs** can be implemented by placing workflows in a private or internal organization repository, ensuring only organization members can install them. + +See [Private Workflows](/gh-aw/reference/frontmatter/#private-workflows-private) for configuration details. + +### 6. Import caching and lock behavior + +When a workflow is compiled, remote imports are resolved and locked. The compiled `.lock.yml` file records the exact commit SHA for every remote import, making runs reproducible regardless of upstream branch movement. -## Practical Guidance +Imports are cached locally under `.github/aw/imports/` by commit SHA. Cached imports are used for all subsequent compilations until you explicitly update them. This means the lock file and the import cache together form the reproducibility guarantee for shared workflows. -For synchronized reuse, start with [Reusing Workflows](/gh-aw/guides/packaging-imports/), `gh aw add`, and imports. +### 7. Cross-repository execution model -For cross-repository control-plane designs, combine this guidance with [SideRepoOps](/gh-aw/patterns/side-repo-ops/) and [MultiRepoOps](/gh-aw/patterns/multi-repo-ops/). +Separate from sharing workflow definitions, workflows can operate across repositories at runtime: -For organizations introducing workflow sharing gradually, it is common to start with templates or starter repositories, then move stable concerns into imports or a shared source repository once conventions have settled. +- Read files and metadata from other repositories during execution. +- Check out code from target repositories for analysis or modification. +- Write safe outputs to target repositories with explicit authentication and allowlists. + +```yaml +safe-outputs: + create-issue: + target-repo: "acme-org/target-repo" + allowed-repos: ["acme-org/repo1", "acme-org/repo2"] +``` + +Cross-repository operations require appropriate GitHub token permissions and explicit `allowed-repos` declarations. See [Cross-Repository Operations](/gh-aw/reference/cross-repository/) for authentication, permissions, and safe output configuration. + +## Recommended Enterprise Pattern + +The recommended pattern for organizations sharing workflows at scale: + +1. **One central `agentic-workflows` repository** holds versioned workflow templates and shared components under `workflows/` and `shared/`. +2. **Consuming repositories** use `gh aw add acme-org/agentic-workflows/@` to install complete workflows. +3. **Common modules** (MCP configurations, safety policies, shared prompts) live in `shared/` and are imported via `imports:` in consuming workflows. +4. **Version tags** on the central repository provide stable anchors for production consumers while branches support development integration. +5. **`private: true`** marks internal-only workflows that should not be exported outside the organization. + +This model gives platform teams centralized ownership and update control while giving consuming teams reproducibility through version pins and the ability to preserve local customizations through 3-way merge. + +## Governance Questions + +When workflows are shared across an organization, the important decisions are usually operational rather than technical: + +- Who owns the source workflow and reviews proposed changes. +- How updates are tested, tagged, and promoted to consuming repositories. +- Which repositories may consume or dispatch to shared workflows. +- How secrets, permissions, and safe outputs are standardized across consumers. +- When a consuming team may fork a workflow rather than stay on the shared version. + +Those decisions affect reliability more than the file format does. ## Related Documentation - [Reusing Workflows](/gh-aw/guides/packaging-imports/) - [Imports Reference](/gh-aw/reference/imports/) +- [Cross-Repository Operations](/gh-aw/reference/cross-repository/) +- [Private Workflows](/gh-aw/reference/frontmatter/#private-workflows-private) - [SideRepoOps](/gh-aw/patterns/side-repo-ops/) - [MultiRepoOps](/gh-aw/patterns/multi-repo-ops/) -- [Workflow Structure](/gh-aw/reference/workflow-structure/)