Skip to content

incident-management: tighten IR template structure and pipeline runbook#424

Open
frameworks-volunteer wants to merge 1 commit intosecurity-alliance:developfrom
frameworks-volunteer:matta/ir-template-tightening
Open

incident-management: tighten IR template structure and pipeline runbook#424
frameworks-volunteer wants to merge 1 commit intosecurity-alliance:developfrom
frameworks-volunteer:matta/ir-template-tightening

Conversation

@frameworks-volunteer
Copy link
Copy Markdown
Collaborator

Summary

This PR is a first pass on the recently added Incident Response Template section.

The goal is not to expand the section broadly, but to make it clearer, tighter, and more operationally credible without adding filler or speculative content.

This pass focuses on three things:

  1. clarifying the distinction between framework guidance, templates, runbooks, and playbooks
  2. reducing a few over-absolute statements
  3. upgrading the weakest runbook in the set (build-pipeline-compromise) into something more responder-oriented

What changed

1) Clarified content taxonomy

Added concise framing so readers can understand what each layer is for:

  • incident-management/overview.mdx
  • clarifies that the section now contains both:
  • framework guidance
  • operational templates
  • incident-management/playbooks/overview.mdx
  • reframes playbooks as reference material, not drop-in internal operating procedures
  • points readers to the template/runbook sections for copy-and-adapt operational docs
  • incident-response-template/overview.mdx
  • clarifies that the broader incident-management pages explain concepts/practices
  • clarifies that the template section is intended to be copied/customized for internal use
  • distinguishes:
  • policy / roles / communications / contacts
  • templates
  • runbooks
  • incident-response-template/templates/overview.mdx
  • clarifies when to use templates vs runbooks vs policy pages
  • incident-response-template/runbooks/overview.mdx
  • clarifies that runbooks are operational procedures, distinct from framework playbooks and blank templates

2) Tightened a few absolute statements

  • incident-response-template/incident-response-policy.mdx
  • changed:
  • "Monitor for at least a week"
  • to:
  • "Monitor based on residual risk, blast radius, and incident type"
  • incident-response-template/roles-and-staffing.mdx
  • changed:
  • "These people should be reachable 24/7"
  • to:
  • "There should be a 24/7 escalation path to these people"

These changes are meant to make the guidance more realistic and less doctrinal.

3) Upgraded the build pipeline compromise runbook

incident-response-template/runbooks/build-pipeline-compromise.mdx was previously a thin stub. This PR upgrades it into a more credible example runbook by adding:

  • better identification criteria
  • scope questions
  • differentiation from adjacent incident classes
  • immediate actions that reflect actual responder priorities:
  • freeze pipeline
  • preserve evidence
  • rotate credentials by blast radius
  • stop trusting recent outputs
  • investigation questions focused on access path, permissions, credential exposure, and affected outputs
  • containment / recovery options:
  • rebuild from known-good commit using clean pipeline
  • rollback to last known-good release
  • keep service paused until trust is re-established
  • a verification gate before normal delivery resumes
  • a concise hardening checklist after the incident

What this PR does not do

Intentionally out of scope for this first pass:

  • broad content expansion
  • adding new Web3-specific runbooks just to fill gaps
  • renaming sections or restructuring the sidebar deeply
  • inventing protocol-specific operational steps without high confidence

I would rather leave gaps visible than fill them with weak or speculative guidance.

Why this scope

The Incident Response Template addition is already valuable, but right now it mixes:

  • framework/reference material
  • internal templates
  • runbooks

This first pass tries to make that structure easier to understand, while also strengthening one page that felt materially underdeveloped.

Follow-up ideas (not included here)

Possible future passes, if useful:

  • strengthen frontend-compromise and dependency-attack
  • add battle-tested Web3-native scenarios only where confidence is high
  • revisit naming/IA if the team wants clearer labels than the current playbook/runbook/template split

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 21, 2026

built with Refined Cloudflare Pages Action

⚡ Cloudflare Pages Deployment

Name Status Preview Last Commit
frameworks ✅ Ready (View Log) Visit Preview 43f0e45

@frameworks-volunteer
Copy link
Copy Markdown
Collaborator Author

Second pass update

This second pass keeps the same philosophy as the first one:

  • no broad expansion
  • no filler
  • no speculative Web3-specific guidance
  • only tighten pages where the operational value is clear and high-confidence

What changed in this pass

This pass focuses on the two runbooks that still felt materially underpowered:

  • incident-response-template/runbooks/frontend-compromise.mdx
  • incident-response-template/runbooks/dependency-attack.mdx

1) Strengthened frontend-compromise

This page now better reflects how frontend incidents actually behave in practice, especially in Web3 where a frontend compromise often becomes a user-signing or
approval-theft incident very quickly.

Changes include:

  • clearer identification and scope questions
  • stronger focus on stopping service quickly
  • explicit emphasis on warning users early and clearly
  • preserving evidence before cleanup
  • tighter framing around identifying the real trust-boundary failure:
  • DNS
  • CDN/hosting
  • dependency
  • build pipeline
  • improved recovery conditions before restoring service
  • more practical affected-user support guidance

The goal here was to make the page more useful during the first minutes of an actual incident, not just more complete on paper.

2) Strengthened dependency-attack

This page was still too close to a stub. It now better distinguishes between a generic vulnerable package and a dependency incident that may have affected real build
outputs, releases, or users.

Changes include:

  • better scope questions:
  • production vs build-only exposure
  • build-time vs runtime execution
  • possible credential / artifact impact
  • clearer differentiation from:
  • frontend compromise
  • build pipeline compromise
  • stronger immediate actions:
  • freeze releases
  • identify the exact package/version path
  • stop trusting recent outputs
  • preserve evidence
  • improved investigation questions
  • more credible containment / recovery options
  • a verification gate before resuming normal delivery
  • tighter prevention guidance focused on dependency discipline and build trust

What I intentionally did not change

Still intentionally out of scope:

  • adding new runbooks just to close every possible gap
  • speculative guidance for scenarios that need deeper expertise or stronger repo context
  • touching pages that did not clearly benefit from high-confidence tightening

For example, I left key-compromise unchanged in this pass rather than make lower-confidence edits.

Why this is the last pass

At this point, the highest-value weak spots in the imported IR template section have been addressed without turning the PR into a broad rewrite.

This keeps the contribution focused on:

  • clearer information architecture
  • more realistic wording
  • stronger responder-oriented runbooks where they were obviously too thin

@mattaereal mattaereal requested a review from scode2277 March 22, 2026 00:54
@mattaereal mattaereal self-assigned this Mar 22, 2026
Comment thread docs/pages/incident-management/incident-response-template/runbooks/overview.mdx Outdated
Comment thread docs/pages/incident-management/incident-response-template/templates/overview.mdx Outdated
Comment thread docs/pages/incident-management/incident-response-template/overview.mdx Outdated
Comment thread docs/pages/incident-management/incident-response-template/overview.mdx Outdated
Comment thread docs/pages/incident-management/overview.mdx Outdated
@scode2277
Copy link
Copy Markdown
Collaborator

Left some comments @mattaereal

@frameworks-volunteer frameworks-volunteer force-pushed the matta/ir-template-tightening branch 2 times, most recently from 9d348ab to d8165b7 Compare April 14, 2026 14:36
@mattaereal
Copy link
Copy Markdown
Collaborator

@frameworks-volunteer can you address @scode2277's comments, solving the conversation and potential conflicts? Take all her feedback seriously.

@frameworks-volunteer frameworks-volunteer force-pushed the matta/ir-template-tightening branch from d8165b7 to 43f0e45 Compare April 14, 2026 14:51
@frameworks-volunteer
Copy link
Copy Markdown
Collaborator Author

Model: z-ai/glm-5.1 Reasoning: high Provider: openrouter

Addressed all of @scode2277 feedback and resolved merge conflicts with develop. Here is what was done:

  1. dependency-attack.mdx -- Fixed pnpm frozen lockfile note: changed `pnpm install --frozen-lockfile` to `pnpm install` (frozen lockfile is default in CI), and added `yarn install --immutable` for completeness.

  2. runbooks/overview.mdx -- Added internal links to Playbooks and Templates pages, per suggestion.

  3. templates/overview.mdx -- Added internal links to template, runbook, and policy pages, per suggestion.

  4. incident-response-template/overview.mdx -- Added internal links to Incident Management and Incident Response Templates in the first distinction block, and added links to Policy/roles/communications/contacts/Templates/Runbooks in the second block.

  5. incident-management/overview.mdx -- Changed "This section" to "This framework", and renamed labels from "Framework guidance/Operational templates" to "Incident Management core knowledge/Incident Response templates", per suggestion.

  6. Merge conflicts resolved -- Rebased onto latest develop. All 6 conflicted files (roles-and-staffing, build-pipeline-compromise, dependency-attack, frontend-compromise, runbooks/overview, templates/overview) resolved by keeping the improved PR content while integrating develop changes.

Copy link
Copy Markdown
Collaborator

@scode2277 scode2277 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gtg for me @mattaereal!

Copy link
Copy Markdown
Collaborator Author

@frameworks-volunteer frameworks-volunteer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model: moonshotai/kimi-k2.6 Reasoning: high Provider: openrouter

Security: clean. No secrets, injection vectors, or unsafe patterns. Content is MDX docs only.

QA: all checks pass.

  • CI green (build, lint, sidebar-reminder)
  • All internal link targets exist and resolve correctly
  • Frontmatter valid and consistent across files
  • No leftover debug content, TODOs, or placeholder text
  • "Stub runbook" language properly replaced with "Example runbook"

Substantive notes:

  1. Build pipeline compromise runbook: major upgrade from thin stub to credible operational document. The addition of scope questions, differentiation from adjacent incident classes, evidence preservation, blast-radius-ordered credential rotation, and a verification gate before resuming delivery are all well-chosen. The three recovery options (rebuild / rollback / pause) with explicit "When" and "Impact" framing is practical.

  2. Dependency attack runbook: similarly upgraded with proper scope questions, differentiation (correctly cross-linking to build-pipeline-compromise and frontend-compromise), and a "Verification Before Resuming" gate. Good that it distinguishes malicious vs vulnerable packages.

  3. Frontend compromise runbook: tightened throughout. The user warning message template is improved ("If you have not signed new transactions, your funds in the protocol remain unaffected" adds important clarity). Evidence preservation moved before cleanup is the right call. The trust boundary failure step is a useful addition.

  4. Taxonomy clarifications across overview pages are concise and well-placed -- they give readers a mental model for framework guidance vs templates vs runbooks vs playbooks without being repetitive.

  5. Policy/staffing softening ("Monitor based on residual risk..." and "24/7 escalation path") are more realistic and less doctrinal. Good changes.

  6. One minor note: templates/overview.mdx line "use a template" self-links to the current page. Not broken, but slightly odd for a directory page. Very low priority.

Approving -- this is a solid, well-scoped first pass that delivers exactly what it promises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants