Skip to content

Squad product: Default workflows burn too many Actions minutes for multi-repo customers #98

@diberry

Description

@diberry

Scope: This is a Squad product issue affecting all customers who install Squad. It covers default workflow minute budgets, upgrade safety, config-driven separation, and workflow consolidation. For the specific bradygaster/squad branch protection problem, see #97.

PRD: Default Workflow Minutes Optimization & Upgrade Safety

Status: Implementation-ready
Priority: P0
Owner: Flight (Lead)
Related: #97 (CI feature flags), #84 (Config version auto-update)


1. Problem Statement

Squad ships 15 workflows that install into every repo. For customers running Squad on multiple repos, the default configuration burns 9,000-17,000+ GitHub Actions minutes/month -- far exceeding GitHub Free (2,000 min) and even Pro (3,000 min) limits.

Key cost drivers:

  • Heartbeat cron (*/30 * * * *) is the dominant offender (~80% of total minutes): ~1,440 runs/month per repo = 1,440-2,880 min/month per repo
  • squad upgrade force-overwrites all workflow customizations (double-write bug), meaning customers who tune their workflows to save minutes lose those changes on every upgrade
  • No mechanism exists for customers to preserve workflow customizations across upgrades
  • Private repos multiply costs: macOS runners cost 2x, Windows runners cost 10x -- a private Windows repo's heartbeat alone burns 14,400 min/month
  • No config-driven separation between workflow logic (squad-owned) and runtime settings (user-owned)

This blocks Squad adoption for budget-conscious startups, SaaS platforms (multi-repo), and agencies (5-10 repos).


2. Goals

Goal Target
Default install monthly minutes ~100-300 min/month per repo (Tier 0)
10 repos at Tier 0 on GitHub Free Fits within 2,000 min/month
Upgrade preserves customer customizations Config-driven separation + backup mechanism
Config-driven architecture Workflow logic (squad-owned) vs runtime settings (user-owned)
Workflow consolidation 15 workflows -> 11
Zero surprise bills squad init defaults to Tier 0 (heartbeat disabled)
Enterprise support Tier 2 (full) recommended for Enterprise Cloud (unlimited minutes)

3. Current State Audit

3.1 Full Workflow Inventory (15 workflows)

Cron-triggered (always running, even if no work exists)

Workflow Cron Runs/month Est. min/month
squad-heartbeat.yml */30 * * * * (every 30min) 1,440 1,440-2,880
squad-docs-links.yml 0 9 * * 1 (weekly Mon 9am) 4 ~8

PR-triggered (per PR activity)

Workflow Trigger Est. min/month
squad-ci.yml pull_request + push 240-500
squad-heartbeat.yml Also triggers on pull_request (included above)

Push-triggered (per merge/push)

Workflow Est. min/month
squad-ci.yml, squad-docs.yml, squad-preview.yml ~50-100
squad-release.yml, squad-insider-*.yml, sync-squad-labels.yml ~20-50

Event-triggered only (low cost, fires on issue/PR events)

Workflow Trigger
squad-issue-assign.yml issues events
squad-label-enforce.yml issues events
squad-triage.yml issues events

Additional workflows (kept as-is)

Workflow Purpose
squad-promote.yml Complex orchestration
sync-squad-labels.yml Triggered on team.md changes
squad-npm-publish.yml Multi-job publishing
squad-docs.yml Pages deploy
ci-rerun.yml Manual dispatch

3.2 Per-Repo Monthly Estimate

Category Minutes
Heartbeat cron 1,440-2,880
CI (PRs + pushes) 300-600
Event-based (issues) ~50
Docs/release/labels ~50
Total per repo ~1,800-3,600

3.3 Multi-Repo Scaling (Linux public repos, 1x multiplier)

Repos Monthly min (est.) GitHub Free (2,000) GitHub Pro (3,000)
1 1,800-3,600 (!) Borderline (!) Borderline
3 5,400-10,800 No -- 2.7-5.4x over No -- 1.8-3.6x over
5 9,000-18,000 No -- 4.5-9x over No -- 3-6x over
10 18,000-36,000 No -- 9-18x over No -- 6-12x over

3.4 Private Repo Multiplier Impact

GitHub Actions charges different rates by runner OS:

Runner Multiplier
Linux 1x
macOS 2x
Windows 10x

Impact on per-repo estimates (Tier 2 -- current defaults):

Repos Linux (1x) macOS Private (2x) Windows Private (10x)
1 1,800-3,600 3,600-7,200 18,000-36,000
5 9,000-18,000 18,000-36,000 90,000-180,000
10 18,000-36,000 36,000-72,000 180,000-360,000

Key insight: Most Squad workflows use ubuntu-latest (1x), but if a customer's CI runs on Windows or macOS runners, the multiplier applies to ALL minutes including Squad's workflows.

Enterprise Cloud note: GitHub Enterprise Cloud provides unlimited minutes on GitHub-hosted runners. Enterprise customers should use Tier 2 (full monitoring) freely. Tier 0/1 are optimized for Free/Pro limits.

3.5 Double-Write Bug in Upgrade Logic

Location: packages/squad-cli/src/cli/core/upgrade.ts:519-553

The upgrade logic writes each workflow file twice:

  1. Line 521-533: Manifest loop writes all overwriteOnUpgrade: true files from TEMPLATE_MANIFEST (lines 184-248), which includes ALL 11 workflows
  2. Line 539-553: Separate readdirSync loop writes all .yml files from templates/workflows/ again

Consequences:

  • If the first write preserves a customer's commented-out cron, the second write may restore it
  • writeWorkflowFile() may strip comments or apply different formatting
  • Any transformation applied twice risks data loss or corruption
Customer action What upgrade does
Changed heartbeat cron to every 6h Reset to */30 * * * *
Added concurrency groups to CI Removed
Added path filters to reduce runs Removed
Disabled cron via commenting Re-enabled
Added feature flag checks (vars.SQUAD_*) Removed

4. Tiered Defaults

Squad ships with minimal minutes by default and lets users opt into higher tiers.

Tier 0 -- Zero-cost (default for squad init)

  • Heartbeat: cron disabled, event-triggers only (issues, PRs, workflow_dispatch)
  • CI: runs on pull_request only (not push)
  • Docs link check: disabled (manual via workflow_dispatch)
  • Labels/triage/assign: event-triggered only (already cheap)
  • Est: ~100-300 min/month per repo

Tier 1 -- Light monitoring

  • Heartbeat: cron every 6 hours (0 */6 * * *)
  • CI: PR + push to default branch
  • Docs link check: weekly
  • Est: ~400-800 min/month per repo

Tier 2 -- Full monitoring (current default, legacy)

  • Heartbeat: every 30 min
  • CI: PR + push
  • Everything enabled
  • Est: ~1,800-3,600 min/month per repo

Scaling Table by Tier x Repos x Plan

Repos Tier 0 (min/mo) Tier 1 (min/mo) Tier 2 (min/mo) Free (2,000) Pro (3,000)
1 100-300 400-800 1,800-3,600 Yes -- T0/T1 Yes -- T0/T1
3 300-900 1,200-2,400 5,400-10,800 Yes -- T0 Yes -- T0/T1
5 500-1,500 2,000-4,000 9,000-18,000 Yes -- T0 (!) T0 only
10 1,000-3,000 4,000-8,000 18,000-36,000 (!) T0 only (!) T0 only

Private repo multiplier: Multiply above by 2x (macOS) or 10x (Windows). For a Windows private 5-repo shop at Tier 0: ~500-1,500 min/month x 10 = 5,000-15,000 min/month for Windows CI jobs. Squad's own workflows run on Linux (1x), so the multiplier primarily affects customer CI jobs.

Enterprise Cloud: Unlimited minutes on GitHub-hosted runners. Use Tier 2 freely.

Init UX

During squad init, surface the tier selection:

Choose a monitoring tier:
  [0] Tier 0 -- Zero-cost (default): ~100-300 min/mo, heartbeat disabled
  [1] Tier 1 -- Light: ~400-800 min/mo, heartbeat every 6h
  [2] Tier 2 -- Full: ~1,800-3,600 min/mo, heartbeat every 30m (legacy default)

5. Architecture: Config-Driven Separation

Principle

Separate workflow logic (squad-owned, safe to overwrite on upgrade) from runtime settings (user-owned, preserved on upgrade).

  • Workflow .yml files contain only logic -> overwriteOnUpgrade: true
  • .squad/config.json contains preferences -> overwriteOnUpgrade: false
  • Workflows read their schedules and feature flags from config at runtime

Config Schema

.squad/config.json gains a ci section:

{
  "version": 1,
  "ci": {
    "tier": "zero",
    "heartbeatCron": "",
    "features": {
      "eslint": true,
      "audit": false,
      "coverage": 0,
      "deletionGuard": true,
      "spellcheck": false
    }
  }
}
Field Type Default Description
ci.tier "zero" | "light" | "full" "zero" Monitoring tier preset
ci.heartbeatCron string "" (disabled) Custom cron expression, overrides tier default
ci.features.eslint boolean true Run ESLint in CI
ci.features.audit boolean false Run npm audit in CI
ci.features.coverage number 0 Coverage threshold (0 = disabled)
ci.features.deletionGuard boolean true Prevent accidental large deletions
ci.features.spellcheck boolean false Run spellcheck in CI

Precedence Order

repo variable (SQUAD_HEARTBEAT_CRON, etc.)
  > .squad/config.json (ci.tier, ci.heartbeatCron, ci.features.*)
  > workflow default (disabled for Tier 0)

Repo variables are overrides (not checked into source). Config.json is the primary source of truth (versioned, portable). Workflow defaults are the fallback (safe for missing config).

Workflow Runtime Integration

Workflows read config at runtime:

# squad-heartbeat.yml
jobs:
  heartbeat:
    steps:
      - name: Read CI config
        id: config
        run: |
          CRON=$(jq -r '.ci.heartbeatCron // ""' .squad/config.json 2>/dev/null || echo "")
          echo "cron=$CRON" >> $GITHUB_OUTPUT

This way: upgrade overwrites the workflow logic (safe), but the workflow reads its schedule from config.json (user-owned, preserved).


6. Upgrade Safety

6.1 Fix Double-Write Bug

Remove workflows from TEMPLATE_MANIFEST entirely. The readdirSync loop at line 539 already handles all workflow files. The manifest entries are redundant and cause double-write.

// templates.ts -- remove workflow entries from TEMPLATE_MANIFEST
- { source: 'workflows/squad-ci.yml', destination: '../.github/workflows/squad-ci.yml', overwriteOnUpgrade: true },
- { source: 'workflows/squad-heartbeat.yml', destination: '../.github/workflows/squad-heartbeat.yml', overwriteOnUpgrade: true },
- // ... (remove all workflow entries)

The readdir loop stays as the single source of truth for workflow upgrades.

6.2 # squad-custom: true Header

If a workflow file contains the header # squad-custom: true, upgrade skips overwriting that file entirely. This is an opt-in escape hatch for customers who heavily customize workflows.

6.3 Backup Before Overwrite

Modified files are backed up before overwriting:

.squad/upgrade-backups/{version}/squad-heartbeat.yml
.squad/upgrade-backups/{version}/squad-ci.yml

Logic:

  1. For each workflow file to be written, check if destination exists
  2. If file has # squad-custom: true header -> skip entirely, log warning
  3. If file differs from template -> backup to .squad/upgrade-backups/{version}/, then overwrite
  4. If file matches template -> overwrite silently (no noise)

6.4 Config.json in Manifest

Add .squad/config.json to TEMPLATE_MANIFEST as user-owned:

{
  source: 'config.json',
  destination: 'config.json',
  overwriteOnUpgrade: false,  // USER-OWNED -- never overwrite
  description: 'Squad configuration (CI tier, feature flags, model prefs)',
}
  • Initialized by squad init with Tier 0 defaults
  • Never touched on upgrade (user-owned)
  • Workflows gracefully handle missing/malformed config (fall back to defaults)

7. Workflow Consolidation (15 -> 11)

Overlaps Found

1. Issue event workflows (3 -> 1)

squad-triage.yml + squad-label-enforce.yml + squad-issue-assign.yml all trigger on issues:labeled. They form a sequential chain: squad label -> triage assigns squad:{member} label -> issue-assign posts comment. Merge into single squad-issue-workflow.yml with conditional jobs.

2. Insider release (2 -> 1)

squad-insider-release.yml + squad-insider-publish.yml both trigger on push:insider. Different purposes (git tag vs npm publish) but can be sequential jobs via needs:.

3. Branch release pipelines (3 -> 1)

squad-release.yml (main), squad-preview.yml (preview), squad-insider-release.yml (insider) all validate/release on push to a branch. Use matrix or conditionals in a single squad-branch-release.yml.

4. @Copilot assignment is tripled

Three workflows independently implement @copilot assignment: squad-triage.yml, squad-issue-assign.yml, squad-heartbeat.yml. Extract to a single reusable workflow (_assign-copilot.yml) called via workflow_call.

5. Heartbeat duplicates triage

squad-heartbeat.yml re-triages issues every 30 minutes that squad-triage.yml already handled on the label event. Creates duplicate comments. Heartbeat should only repair orphaned labels, not redo triage.

Consolidation Map

New workflow Replaces Risk
squad-issue-workflow.yml triage + label-enforce + issue-assign Low -- conditional if: per job
squad-branch-release.yml release + preview + insider-release Medium -- needs per-branch conditionals
squad-insider-full.yml insider-release + insider-publish Low -- sequential needs: chain
squad-heartbeat.yml (modified) Remove duplicate triage logic Low -- just delete a job
_assign-copilot.yml (reusable) Extract @copilot logic from 3 files Low -- called via workflow_call

Keep as-is (well-isolated)

  • squad-ci.yml -- core CI
  • squad-promote.yml -- complex orchestration
  • sync-squad-labels.yml -- distinct trigger (team.md changes)
  • squad-npm-publish.yml -- complex multi-job publishing
  • squad-docs.yml -- Pages deploy
  • squad-docs-links.yml -- weekly link check
  • ci-rerun.yml -- manual dispatch

Minutes Impact

Consolidation doesn't directly reduce minutes (same work runs either way), but it:

  • Eliminates duplicate runs -- heartbeat won't re-triage already-triaged issues
  • Reduces workflow startup overhead -- 1 workflow with 3 jobs starts faster than 3 separate workflows
  • Simplifies feature flags -- fewer files to add vars.SQUAD_* checks to
  • Reduces upgrade surface -- fewer workflow files in template manifest = fewer things to overwrite

8. Implementation Phases

Phase 1 -- P0: Upgrade Safety & Config Foundation

Dependencies: None (start here)

  1. Fix double-write bug -- Remove workflow entries from TEMPLATE_MANIFEST in templates.ts. Keep readdir loop as single source of truth.
  2. Add config.json template -- Create template with CI tier defaults (tier: "zero", heartbeatCron: ""). Add to TEMPLATE_MANIFEST with overwriteOnUpgrade: false.
  3. Add upgrade safety mechanism -- Implement # squad-custom: true header check + backup to .squad/upgrade-backups/{version}/ in upgrade.ts.

Phase 2 -- P0: Ship Tier 0 Defaults

Dependencies: Phase 1 (config.json must exist)

  1. Update heartbeat template -- Ship with cron disabled by default. Workflow reads cron from .squad/config.json at runtime.
  2. Update CI template -- Default to pull_request only (no push trigger). Configurable via config.json.
  3. Init UX -- squad init surfaces tier selection, writes choice to .squad/config.json. Prints: "CI tier: zero (heartbeat disabled, ~100-300 min/month). Change in .squad/config.json"

Phase 3 -- P1: Workflow Consolidation (15 -> 11)

Dependencies: Phase 2 (Tier 0 defaults stabilized)

  1. Merge issue event workflows (3 -> 1): squad-issue-workflow.yml
  2. Merge release workflows (3 -> 1): squad-branch-release.yml
  3. Merge insider workflows (2 -> 1): squad-insider-full.yml
  4. Extract @copilot assignment to reusable _assign-copilot.yml
  5. Remove duplicate triage from heartbeat

Phase 4 -- P1: Tests for Upgrade Preservation

Dependencies: Phases 1-2 (mechanism must be implemented first)

  1. Test: config.json preserved on upgrade -- Init -> write custom config -> upgrade -> assert config unchanged
  2. Test: squad-custom header skips overwrite -- Init -> add header -> modify workflow -> upgrade -> assert preserved
  3. Test: modified workflows backed up -- Init -> modify workflow -> upgrade -> assert backup exists in .squad/upgrade-backups/
  4. Test: user-owned files not overwritten -- Init -> modify ceremonies.md -> upgrade -> assert unchanged

Phase 5 -- P2: Docs & Polish

Dependencies: Phases 1-4 complete

  1. Update cost analysis docs with private repo multiplier tables
  2. Add enterprise guidance -- "Enterprise Cloud = unlimited minutes. Use Tier 2 freely."
  3. Upgrade transition playbook -- Document how v0.8.x customers transition: check .squad/upgrade-backups/ for preserved files
  4. Config schema documentation -- Add config.schema.json for validation

9. Approach Comparison

Approach Description Best for Brady can toggle?
Config tiers .squad/config.json with ci.tier Teams wanting preset packages Yes -- Edit one field
Feature flags (repo vars) SQUAD_HEARTBEAT_CRON, SQUAD_CI_ON_PUSH Per-repo overrides, not in source Yes -- Set in repo settings
Heartbeat ships disabled Template ships with cron commented out Immediate relief, no config needed (!) Must edit YAML
Config + flags (recommended) Config.json as primary, repo vars as override Full flexibility, versioned + override Yes -- Either path works

Recommendation: Config.json as primary (versioned, portable) + repo variables as override (per-environment). This gives teams a checked-in source of truth with per-repo escape hatches.


10. Acceptance Criteria

  • squad init creates repos with Tier 0 defaults (~100-300 min/month per repo)
  • squad init surfaces tier selection to user during setup
  • squad upgrade preserves customer workflow customizations via config-driven separation
  • squad upgrade backs up modified workflows to .squad/upgrade-backups/{version}/ before overwriting
  • squad upgrade skips files with # squad-custom: true header
  • Double-write bug is fixed (workflows written once, not twice)
  • .squad/config.json initialized by squad init with CI tier defaults
  • .squad/config.json marked overwriteOnUpgrade: false in manifest
  • Workflows read runtime settings from config.json (not hardcoded)
  • Precedence order works: repo variable > config.json > default
  • 10 repos at Tier 0 fits within GitHub Free (2,000 min/month)
  • Workflow count reduced from 15 to 11
  • No duplicate @copilot assignment logic across workflows
  • Tests verify upgrade preservation behavior (config preserved, header skip, backup created)
  • Private repo multiplier documented in cost analysis
  • Enterprise Cloud guidance documented (use Tier 2 freely)

11. Related Issues


Consolidated from original issue analysis + Flight review + FIDO quality review + consolidation analysis + blocker fix proposals. All findings, tables, and numbers sourced from issue #98 comments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgo:needs-researchNeeds investigationsquadSquad triage inbox — Lead will assign to a membersquad:archiveResolved by upstream or no longer applicablesquad:flightAssigned to Flight (Lead)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions