Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# ADR-29037: Time-Bounded Cache Eviction via `--after` Flag in the Logs Command

**Date**: 2026-04-29
**Status**: Draft
**Deciders**: pelikhan

---

## Part 1 — Narrative (Human-Friendly)

### Context

The `gh aw logs` command downloads GitHub Actions workflow run logs and caches them locally in `run-{ID}` subdirectories under a configurable output directory. In shared storage environments — where multiple engineers or CI jobs share a single output directory — these cached folders accumulate without bound, consuming disk space indefinitely. Before this change, no built-in mechanism existed to prune stale cache entries; users were forced to write external scripts or perform manual cleanup. The date-delta format already used by `--start-date` and `--end-date` (e.g., `-1w`, `-30d`, `YYYY-MM-DD`) provides a natural, consistent way to express a cutoff.

### Decision

We will add a `--after` flag to `gh aw logs` that, when provided, deletes all cached `run-{ID}` directories whose creation date (taken from `run_summary.json` inside each folder, falling back to the directory's modification time) predates the specified cutoff before proceeding with the normal download. We will reuse the existing `workflow.ResolveRelativeDate` helper to parse the flag value so that relative deltas and absolute dates are handled consistently with the rest of the CLI. Cleanup failures are non-fatal: a warning is printed and the download proceeds regardless.

### Alternatives Considered

#### Alternative 1: Dedicated `gh aw logs clean` Subcommand

A separate subcommand (e.g., `gh aw logs clean --before -1w`) would make the destructive cache-eviction operation a first-class, explicit user action rather than a side-effect of the download command. This approach provides clearer UX separation — users who want to download logs would never accidentally trigger cleanup. It was not chosen because it adds a new top-level entry point for a narrow utility concern, and the intended primary use case is "clean up, then download in one step" (e.g., in a cron job maintaining a rolling window), which a combined flag handles more ergonomically.

#### Alternative 2: Time-Range Filter on Downloads (not cache eviction)

An `--after` flag could alternatively mean "only download runs created after this date", making it a symmetrical companion to `--start-date`. This was not chosen because `--start-date` already serves that role. Re-using `--after` as a download filter would create ambiguity with the existing flag semantics and would not solve the disk-space accumulation problem at all.

### Consequences

#### Positive
- Disk space management is available natively without requiring external scripts or cron jobs that call `rm -rf`.
- The flag reuses the same date-delta format (`-1w`, `-30d`, `YYYY-MM-DD`) already familiar to users of `--start-date` / `--end-date`, reducing the learning surface.
- Cleanup is non-fatal, so a transient file-system error does not block the download that follows.
- Only directories matching the `run-{ID}` naming pattern are touched, making the operation safe against accidentally deleting user-created files in the output directory.

#### Negative
- The `--after` name is ambiguous: a user reading the flag description for the first time may expect it to be a download date filter rather than a cache eviction trigger. The help text must be precise to avoid confusion.
- Cache eviction is a side-effect of the download command rather than a standalone operation; users who want to evict without downloading must still invoke `gh aw logs` even when they have no interest in downloading new runs.
- Adding yet another positional parameter to the already-wide `DownloadWorkflowLogs` function signature increases the cognitive cost of calling that function from tests.

#### Neutral
- All existing call sites of `DownloadWorkflowLogs` must be updated to pass an `after` string; passing an empty string disables cleanup, preserving backward compatibility.
- The fallback from `run_summary.json` creation timestamp to directory modification time means that eviction behaviour may differ slightly for incomplete downloads, but this is bounded and documented.

---

## Part 2 — Normative Specification (RFC 2119)

> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119).

### Cache Eviction Scope

1. Implementations **MUST** restrict cache eviction to directories whose names match the `run-{ID}` prefix pattern; directories with any other name **MUST NOT** be deleted.
2. Implementations **MUST** determine a directory's effective run date by reading the `CreatedAt` field from `run_summary.json` inside that directory when the file is present and parseable.
3. Implementations **MUST** fall back to the directory's file-system modification time as the effective run date when `run_summary.json` is absent or unparseable.
4. Implementations **MUST** delete a `run-{ID}` directory if and only if its effective run date is strictly before the resolved cutoff timestamp.

### Cutoff Parsing

1. Implementations **MUST** accept the `--after` flag value in either absolute (`YYYY-MM-DD`) or relative delta (`-Nd`, `-Nw`, `-Nmo`) formats.
2. Implementations **MUST** resolve relative delta values using the same `workflow.ResolveRelativeDate` helper used by `--start-date` and `--end-date` to ensure consistent date arithmetic across the CLI.
3. Implementations **MUST** return a user-facing error and abort if the provided `--after` value cannot be parsed into a valid cutoff timestamp.

### Error Handling and Ordering

1. Implementations **MUST** execute cache eviction before initiating any new log downloads when `--after` is specified.
2. Implementations **MUST NOT** abort the download step if cache eviction encounters a file-system error; the error **MUST** be surfaced as a non-fatal warning on stderr.
3. Implementations **SHOULD** print a human-readable summary of removed folder count on stderr when one or more directories are deleted.
4. Implementations **MAY** suppress the "nothing to clean" message unless `--verbose` is also set.

### Conformance

An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance.

---

*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/25090240769) workflow. The PR author must review, complete, and finalize this document before the PR can merge.*
4 changes: 2 additions & 2 deletions pkg/cli/context_cancellation_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ func TestDownloadWorkflowLogsWithCancellation(t *testing.T) {
cancel()

// Try to download logs with a cancelled context
err := DownloadWorkflowLogs(ctx, "", 10, "", "", "/tmp/test-logs", "", "", 0, 0, "", false, false, false, false, false, false, false, 0, "", "", false, false, "", nil)
err := DownloadWorkflowLogs(ctx, "", 10, "", "", "/tmp/test-logs", "", "", 0, 0, "", false, false, false, false, false, false, false, 0, "", "", false, false, "", nil, "")

// Should return context.Canceled error
assert.ErrorIs(t, err, context.Canceled, "Should return context.Canceled error when context is cancelled")
Expand Down Expand Up @@ -111,7 +111,7 @@ func TestDownloadWorkflowLogsTimeoutRespected(t *testing.T) {

start := time.Now()
// Use a workflow name that doesn't exist to avoid actual network calls
_ = DownloadWorkflowLogs(ctx, "nonexistent-workflow-12345", 100, "", "", "/tmp/test-logs", "", "", 0, 0, "", false, false, false, false, false, false, false, 1, "", "", false, false, "", nil)
_ = DownloadWorkflowLogs(ctx, "nonexistent-workflow-12345", 100, "", "", "/tmp/test-logs", "", "", 0, 0, "", false, false, false, false, false, false, false, 1, "", "", false, false, "", nil, "")
elapsed := time.Since(start)

// Should complete within reasonable time (give 5 seconds buffer for test overhead)
Expand Down
96 changes: 96 additions & 0 deletions pkg/cli/logs_cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,13 @@ import (
"fmt"
"os"
"path/filepath"
"strconv"
"strings"
"time"

"github.com/github/gh-aw/pkg/console"
"github.com/github/gh-aw/pkg/logger"
"github.com/github/gh-aw/pkg/workflow"
)

var logsCacheLog = logger.New("cli:logs_cache")
Expand Down Expand Up @@ -63,6 +66,99 @@ func loadRunSummary(outputDir string, verbose bool) (*RunSummary, bool) {
return &summary, true
}

// parseCleanupCutoff resolves a date string (absolute or relative delta) to a
// time.Time that can be used as a cutoff for the cache cleanup. Accepts the
// same formats as --start-date / --end-date (e.g. "-1w", "-30d", "2024-01-01").
func parseCleanupCutoff(after string) (time.Time, error) {
cutoffStr, err := workflow.ResolveRelativeDate(after, time.Now())
if err != nil {
return time.Time{}, fmt.Errorf("invalid --after value '%s': %w", after, err)
}

// ResolveRelativeDate returns an RFC3339 timestamp for relative inputs and
// the original string for absolute dates.
if t, parseErr := time.Parse(time.RFC3339, cutoffStr); parseErr == nil {
return t, nil
}
if t, parseErr := time.Parse("2006-01-02", cutoffStr); parseErr == nil {
return t, nil
}
return time.Time{}, fmt.Errorf("invalid --after value '%s': could not parse resolved date '%s'", after, cutoffStr)
}

// cleanupOldRunFolders removes cached run folders from outputDir whose run creation
// date is before the given cutoff time. Only directories matching the "run-{ID}"
// naming pattern are considered. The creation date is read from run_summary.json
// inside each folder; if that file is absent the directory modification time is
// used as a fallback. Returns the number of folders removed.
func cleanupOldRunFolders(outputDir string, cutoff time.Time, verbose bool) (int, error) {
logsCacheLog.Printf("Cleaning up run folders older than %s in %s", cutoff.Format(time.RFC3339), outputDir)

entries, err := os.ReadDir(outputDir)
if err != nil {
if os.IsNotExist(err) {
return 0, nil
}
return 0, fmt.Errorf("failed to read output directory: %w", err)
}

removed := 0
for _, entry := range entries {
if !entry.IsDir() {
continue
}
if !strings.HasPrefix(entry.Name(), "run-") {
continue
}
// Only consider directories whose name is exactly "run-{integer}" to avoid
// accidentally deleting unrelated directories like "run-backup" or "run-temp".
if _, parseErr := strconv.ParseInt(strings.TrimPrefix(entry.Name(), "run-"), 10, 64); parseErr != nil {
logsCacheLog.Printf("Skipping non-run directory: %s", entry.Name())
continue
}

Comment on lines +110 to +119
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says cleanup should only target directories matching the run-{ID} pattern, but the current check only enforces the run- prefix. This can accidentally delete unrelated directories like run-backup or run-temp. Consider validating that the suffix is a valid integer (e.g., strconv.ParseInt(strings.TrimPrefix(name, "run-"), 10, 64)), and skip entries that don’t parse cleanly.

Copilot uses AI. Check for mistakes.
runDir := filepath.Join(outputDir, entry.Name())

// Determine the run date: prefer the GitHub run creation timestamp from
// run_summary.json so the cutoff is relative to when the workflow actually
// ran, not when we downloaded or processed it.
var runDate time.Time
summaryPath := filepath.Join(runDir, runSummaryFileName)
if data, readErr := os.ReadFile(summaryPath); readErr == nil {
var summary RunSummary
if jsonErr := json.Unmarshal(data, &summary); jsonErr == nil && !summary.Run.CreatedAt.IsZero() {
runDate = summary.Run.CreatedAt
}
}

// Fall back to directory modification time when the summary is unavailable.
if runDate.IsZero() {
info, statErr := entry.Info()
if statErr != nil {
logsCacheLog.Printf("Failed to stat run directory %s: %v", entry.Name(), statErr)
continue
}
runDate = info.ModTime()
}

if runDate.Before(cutoff) {
logsCacheLog.Printf("Removing old run folder: %s (run date: %s, cutoff: %s)", entry.Name(), runDate.Format(time.RFC3339), cutoff.Format(time.RFC3339))
if verbose {
fmt.Fprintln(os.Stderr, console.FormatInfoMessage(fmt.Sprintf("Removing old run folder: %s (run date: %s)", entry.Name(), runDate.Format("2006-01-02"))))
}
if removeErr := os.RemoveAll(runDir); removeErr != nil {
logsCacheLog.Printf("Failed to remove run folder %s: %v", entry.Name(), removeErr)
fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("Failed to remove old run folder %s: %v", entry.Name(), removeErr)))
continue
}
removed++
}
}

logsCacheLog.Printf("Removed %d old run folders (cutoff: %s)", removed, cutoff.Format(time.RFC3339))
return removed, nil
}

// saveRunSummary saves a run summary to disk
func saveRunSummary(outputDir string, summary *RunSummary, verbose bool) error {
logsCacheLog.Printf("Saving run summary to cache: dir=%s, run_id=%d", outputDir, summary.RunID)
Expand Down
165 changes: 165 additions & 0 deletions pkg/cli/logs_cache_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
//go:build !integration

package cli

import (
"encoding/json"
"os"
"path/filepath"
"strconv"
"testing"
"time"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)

// makeRunDir creates a run-{id} directory with an optional run_summary.json.
func makeRunDir(t *testing.T, parent string, id int64, createdAt time.Time, writeSummary bool) string {
t.Helper()
dir := filepath.Join(parent, "run-"+strconv.FormatInt(id, 10))
require.NoError(t, os.MkdirAll(dir, 0755), "create run dir")

if writeSummary {
summary := RunSummary{
CLIVersion: "test",
RunID: id,
ProcessedAt: time.Now(),
Run: WorkflowRun{
DatabaseID: id,
CreatedAt: createdAt,
},
}
data, err := json.Marshal(summary)
require.NoError(t, err, "marshal run summary")
require.NoError(t, os.WriteFile(filepath.Join(dir, runSummaryFileName), data, 0644), "write run summary")
}

return dir
}

func TestCleanupOldRunFolders(t *testing.T) {
now := time.Now()
cutoff := now.Add(-7 * 24 * time.Hour) // 1 week ago

tests := []struct {
name string
setup func(t *testing.T, dir string)
wantRemoved int
wantDirsLeft []string
wantDirsRemoved []string
}{
{
name: "removes folders older than cutoff",
setup: func(t *testing.T, dir string) {
makeRunDir(t, dir, 1, now.Add(-14*24*time.Hour), true) // 2 weeks old - should be removed
makeRunDir(t, dir, 2, now.Add(-3*24*time.Hour), true) // 3 days old - should be kept
},
wantRemoved: 1,
wantDirsLeft: []string{"run-2"},
wantDirsRemoved: []string{"run-1"},
},
{
name: "keeps folders newer than cutoff",
setup: func(t *testing.T, dir string) {
makeRunDir(t, dir, 10, now.Add(-1*24*time.Hour), true) // 1 day old - kept
makeRunDir(t, dir, 11, now.Add(-2*24*time.Hour), true) // 2 days old - kept
},
wantRemoved: 0,
wantDirsLeft: []string{"run-10", "run-11"},
},
{
name: "removes all old folders",
setup: func(t *testing.T, dir string) {
makeRunDir(t, dir, 20, now.Add(-30*24*time.Hour), true) // 30 days old - removed
makeRunDir(t, dir, 21, now.Add(-14*24*time.Hour), true) // 14 days old - removed
},
wantRemoved: 2,
wantDirsRemoved: []string{"run-20", "run-21"},
},
{
name: "ignores non run- directories",
setup: func(t *testing.T, dir string) {
// A directory not following the run-{ID} pattern should not be touched
nonRunDir := filepath.Join(dir, "summary")
require.NoError(t, os.MkdirAll(nonRunDir, 0755))

makeRunDir(t, dir, 30, now.Add(-30*24*time.Hour), true) // old - removed
},
wantRemoved: 1,
wantDirsLeft: []string{"summary"},
wantDirsRemoved: []string{"run-30"},
},
{
name: "ignores run- directories with non-integer suffix",
setup: func(t *testing.T, dir string) {
// Directories like "run-backup" or "run-temp" must not be removed
for _, name := range []string{"run-backup", "run-temp", "run-old"} {
require.NoError(t, os.MkdirAll(filepath.Join(dir, name), 0755))
}
makeRunDir(t, dir, 50, now.Add(-30*24*time.Hour), true) // old with numeric ID - removed
},
wantRemoved: 1,
wantDirsLeft: []string{"run-backup", "run-temp", "run-old"},
wantDirsRemoved: []string{"run-50"},
},
{
name: "falls back to dir mtime when no run_summary.json",
setup: func(t *testing.T, dir string) {
// Create a run dir without a summary file; its mtime will be recent
makeRunDir(t, dir, 40, time.Time{}, false) // no summary; mtime is now
makeRunDir(t, dir, 41, now.Add(-30*24*time.Hour), true) // old with summary - removed
},
wantRemoved: 1,
wantDirsLeft: []string{"run-40"},
wantDirsRemoved: []string{"run-41"},
},
{
name: "empty output directory returns zero removed",
setup: func(t *testing.T, dir string) {
// nothing to do
},
wantRemoved: 0,
},
{
name: "non-existent output directory returns zero removed",
setup: func(t *testing.T, dir string) {
// Remove the directory so it doesn't exist
require.NoError(t, os.RemoveAll(dir))
},
wantRemoved: 0,
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
tmpDir := t.TempDir()
tt.setup(t, tmpDir)

removed, err := cleanupOldRunFolders(tmpDir, cutoff, false)

require.NoError(t, err, "cleanupOldRunFolders should not return an error")
assert.Equal(t, tt.wantRemoved, removed, "number of removed folders should match")

for _, name := range tt.wantDirsLeft {
assert.DirExists(t, filepath.Join(tmpDir, name), "directory should still exist: %s", name)
}
for _, name := range tt.wantDirsRemoved {
assert.NoDirExists(t, filepath.Join(tmpDir, name), "directory should have been removed: %s", name)
}
})
}
}

func TestCleanupOldRunFoldersVerbose(t *testing.T) {
now := time.Now()
cutoff := now.Add(-7 * 24 * time.Hour)
tmpDir := t.TempDir()

makeRunDir(t, tmpDir, 99, now.Add(-30*24*time.Hour), true)

// Should work identically in verbose mode without panicking
removed, err := cleanupOldRunFolders(tmpDir, cutoff, true)
require.NoError(t, err, "verbose cleanup should not error")
assert.Equal(t, 1, removed, "one folder should be removed in verbose mode")
}
1 change: 1 addition & 0 deletions pkg/cli/logs_ci_scenario_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ func TestLogsJSONOutputWithNoRuns(t *testing.T) {
false, // train
"", // format
nil, // artifactSets
"", // after
)

// Restore stdout and read output
Expand Down
Loading