Skip to content

Network hardening + CI: size limits, retries, client reuse, tests#2

Merged
itsdevcoffee merged 1 commit intomainfrom
network/reliability-hardening
Dec 16, 2025
Merged

Network hardening + CI: size limits, retries, client reuse, tests#2
itsdevcoffee merged 1 commit intomainfrom
network/reliability-hardening

Conversation

@itsdevcoffee
Copy link
Owner

Summary

Harden network layer against DoS/failures and establish CI quality gates.

Problems Fixed

1. Unbounded HTTP Reads (CRITICAL)

  • Risk: Malicious server returns 10GB JSON → client OOM crash
  • Fix: io.LimitReader caps response at 10MB
  • Evidence: github.go:88, registry.go:127

2. No Retry Logic (HIGH)

  • Risk: Single network hiccup aborts entire marketplace fetch
  • Fix: 3 retries with exponential backoff (1s, 2s, 4s)
  • Evidence: github.go:40-59

3. HTTP Client Churn (MEDIUM)

  • Risk: New client per request → connection overhead, slow fetches
  • Fix: Singleton pattern with sync.Once
  • Evidence: github.go:120-131

4. Unbounded Concurrency (MEDIUM)

  • Risk: 7+ parallel fetches → GitHub rate limiting (60 req/hr)
  • Fix: Semaphore limits to 5 concurrent fetches
  • Evidence: discovery.go:85, discover_with_registry.go:25

5. Silent Clipboard Errors (LOW)

  • Risk: User presses c to copy, nothing happens → confusion
  • Fix: Red error flash "✗ Clipboard error" for 3 seconds
  • Evidence: update.go:315, view.go:515

6. No CI Checks (HIGH)

  • Risk: Bugs, race conditions, vulnerabilities slip into main
  • Fix: Automated tests, linting, vuln scanning on every PR
  • Evidence: .github/workflows/ci.yml

Changes Made

HTTP Layer (internal/marketplace/github.go, registry.go)

  • Body size limit: io.LimitReader(resp.Body, MaxResponseBodySize)

    • Checks if limit hit by attempting to read one more byte
    • Returns clear error: "response body exceeded 10485760 bytes"
  • Retry with backoff: FetchManifestFromGitHub now calls fetchManifestAttempt in loop

    • 3 attempts max
    • Backoff: time.Duration(1<<uint(attempt)) * time.Second
    • Preserves last error for debugging
  • Singleton client: httpClient() uses sync.Once

    • Single http.Client instance for all requests
    • Connection pooling via Transport (10 idle, 5 per host)

Concurrency Control (discovery.go, discover_with_registry.go)

  • Semaphore pattern:
    sem := make(chan struct{}, MaxConcurrentFetches)
    sem <- struct{}{}        // Acquire
    defer func() { <-sem }() // Release
  • Limits parallel marketplace fetches to 5

UI Feedback (internal/ui/)

  • New state: clipboardErrorFlash bool in Model
  • New message: clearClipboardErrorMsg with 3s delay
  • Visual: Red Error color (#EF4444) added to palette
  • Display: "✗ Clipboard error" shown in detail view footer

CI/CD Infrastructure

  • .github/workflows/ci.yml: 4 jobs run on every PR/push

    1. test: go test -v -race -coverprofile=coverage.out ./...
    2. lint: golangci-lint + gofmt validation
    3. vuln-check: govulncheck ./...
    4. build: go build -v ./cmd/plum/
  • .golangci.yml: Linter config

    • Enabled: errcheck, gosimple, govet, staticcheck, gosec, gofmt, etc.
    • Security-focused (gosec enabled)

Test Results

Unit Tests

```bash
$ go test -v -race ./...
ok github.com/itsdevcoffee/plum/internal/marketplace 1.040s
```

Build Verification

```bash
$ go build ./cmd/plum/

Success - binary created

```

Race Detector

```bash
$ go test -race ./...

No data races detected

```


How to Validate

Automated

```bash

Run full test suite

go test -v -race ./...

Build binary

go build ./cmd/plum/

Lint (requires golangci-lint)

golangci-lint run

Vulnerability scan (requires govulncheck)

govulncheck ./...
```

Manual Testing

1. Retry Behavior

  • Disconnect network temporarily
  • Run plum, press Shift+U
  • Reconnect network mid-refresh
  • → Should retry and succeed (watch for 1s/2s/4s delays)

2. Clipboard Error Feedback

  • Block clipboard access (e.g., via permissions or xclip removal)
  • Press Enter on a plugin, then press c
  • → Should show red "✗ Clipboard error" message

3. HTTP Client Reuse

  • Add debug logging to httpClient()
  • Trigger multiple fetches (Shift+U)
  • sync.Once ensures only 1 client created

4. Concurrency Limit

  • Add logging in semaphore acquire/release
  • Trigger Shift+U refresh
  • → Max 5 concurrent goroutines active at any time

5. Body Size Limit

  • Mock server returning >10MB response (requires test refactoring)
  • → Should error with "response body exceeded..." message

Checklist

  • Tests pass (go test -race ./...)
  • Build succeeds
  • No race conditions
  • CI workflow added
  • Linter config added
  • Backward compatible (no breaking changes)
  • User-visible errors improved

Follow-up Work (Future PRs)

  1. Refactor for testability: Make buildRawURL accept base URL to enable full HTTP mocking
  2. Retry 429/503 with backoff: Currently retries network errors; could add smarter retry for rate limiting
  3. Telemetry: Track retry rates, cache hit rates for observability

Notes

  • Test file github_test.go contains skeleton tests marked as skipped
  • Full HTTP mocking requires refactoring buildRawURL to be injectable (out of scope for this PR)
  • CI will run automatically on this PR to validate the workflow

## What
Harden network layer and add comprehensive CI pipeline:
1. HTTP response size limits (DoS protection)
2. Exponential backoff retries for transient failures
3. Singleton HTTP client for connection reuse
4. Concurrency limiting (max 5 parallel fetches)
5. Visible clipboard error feedback
6. CI workflow with tests, linting, vulnerability scanning

## Changes

### HTTP Safety
- Limit response body to 10MB via io.LimitReader (github.go, registry.go)
- Retry failed requests 3x with exponential backoff (1s, 2s, 4s)
- Singleton HTTP client for connection pooling and reuse

### Concurrency Control
- Semaphore pattern limits concurrent marketplace fetches to 5
- Prevents GitHub rate limiting and resource exhaustion
- Applied to both discovery.go and discover_with_registry.go

### UX Improvements
- Clipboard errors now visible to user (3s red flash)
- Added Error color (#EF4444) to UI palette
- Clear feedback instead of silent failures

### CI/CD
- .github/workflows/ci.yml: 4 jobs (test, lint, vuln-check, build)
- go test -race for race condition detection
- golangci-lint with gosec for security analysis
- govulncheck for known vulnerability scanning
- gofmt validation

## Why
- **Security**: Unbounded reads could DoS client via large responses
- **Reliability**: Transient network failures should auto-retry
- **Performance**: Connection reuse reduces latency/overhead
- **UX**: Users need to know when clipboard fails
- **Quality**: CI catches bugs before merge

## Testing
\`\`\`bash
go test -v -race ./...
go build ./cmd/plum/
# All tests pass, no race conditions
\`\`\`

## Manual Validation
1. Run plum, press Shift+U → observe retry behavior on network hiccup
2. Test clipboard copy (c key) → error should show if clipboard unavailable
3. Verify HTTP client reuse (singleton pattern)
4. Check concurrency limit (max 5 parallel requests)
@itsdevcoffee itsdevcoffee merged commit da97b61 into main Dec 16, 2025
3 of 4 checks passed
itsdevcoffee added a commit that referenced this pull request Dec 16, 2025
Focused follow-up pass to close remaining edge cases after PRs #1 and #2:

1. Windows-safe atomic file replacement:
   - Add atomicRename helper with remove+retry fallback
   - Fixes os.Rename failures when destination exists on Windows
   - Applied to both cache.go and registry.go

2. Tightened retry policy:
   - Only retry transient failures (5xx, 429, network/timeout errors)
   - Skip retries for permanent failures (4xx except 429)
   - Add httpStatusError type for structured error handling
   - Add isRetryableError() helper with proper error type checking

3. HTTP tests made real:
   - Convert GitHubRawBase from const to package variable
   - Remove all t.Skip() scaffolding
   - Add asserting tests for:
     * Body size limit enforcement
     * Retry behavior on transient failures
     * Non-retry behavior on permanent failures (404)
     * Invalid JSON handling
     * Timeout handling (skipped in short mode)

4. CI polish:
   - Switch from hardcoded go-version to go-version-file: go.mod
   - Improves maintainability and version consistency

All tests passing with race detector enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant