pi.ruv.io: cognitive cycle blocks HTTP handler causing 504 upstream timeouts

## Summary

`pi.ruv.io` (Cloud Run service `ruvbrain`) was returning **504 upstream request timeouts** on all endpoints. Every HTTP request timed out at exactly 300s (Cloud Run max timeout) while background cognitive cycles continued running normally.

## Root Cause

The `mcp_brain_server` cognitive cycle (runs every ~3 min) processes heavy workloads on the same async runtime as HTTP handlers:
- 200 auto-votes per cycle
- LoRA auto-submissions from SONA patterns
- Inference processing (11-21 inferences per cycle)
- Strange loop calculations

This appears to starve the HTTP handler, preventing it from serving any requests. The process stays alive (cognitive logs continue) but all HTTP responses hang until the 300s Cloud Run timeout.

## Evidence

- All requests returned `504` with `latency: 299.97s` (Cloud Run timeout ceiling)
- App logs showed healthy cognitive cycles (`Cognitive cycle #22-24`) running every ~3 min
- `SIGTERM` graceful shutdown observed at `2026-03-27T13:58:50Z`, followed by restart
- After restart, cognitive cycles resumed but HTTP remained unresponsive
- Two instance IDs visible in logs — both instances affected
- `curl` confirmed TCP connect succeeds (0.003s) but app never sends response

## Resolution (Temporary)

Redeployed fresh revision (`ruvbrain-00159-cjl`) — service immediately responsive (77ms).

## Permanent Fix Needed

1. **Move cognitive cycle to a separate tokio task with budget/yield** — ensure HTTP handlers are never starved by background processing
2. **Add a `/healthz` liveness probe** that Cloud Run can use to detect and restart unresponsive instances
3. **Consider `tokio::task::spawn_blocking`** for heavy compute (auto-votes, LoRA) to avoid blocking the async executor
4. **Add request timeout middleware** (e.g., 30s) so individual requests fail fast instead of hanging to 300s

## Environment

- Service: `ruvbrain` / `us-central1`
- Image: `gcr.io/ruv-dev/ruvbrain:latest`
- Revision: `ruvbrain-00158-2lb` (affected), `ruvbrain-00159-cjl` (redeployed)
- Config: 2 CPU, 2Gi memory, minScale=1, maxScale=10, session affinity enabled

## Labels

- **priority**: high
- **component**: mcp-brain-server
- **type**: bug

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pi.ruv.io: cognitive cycle blocks HTTP handler causing 504 upstream timeouts #305

Summary

Root Cause

Evidence

Resolution (Temporary)

Permanent Fix Needed

Environment

Labels

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

pi.ruv.io: cognitive cycle blocks HTTP handler causing 504 upstream timeouts #305

Description

Summary

Root Cause

Evidence

Resolution (Temporary)

Permanent Fix Needed

Environment

Labels

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions