feat(brain-repo): brain repo versioning + onboarding wizard with restore flow#32
feat(brain-repo): brain repo versioning + onboarding wizard with restore flow#32NeritonDias wants to merge 32 commits into
Conversation
… columns, BRAIN_REPO_MASTER_KEY, watchdog dep - Add User.onboarding_state + User.onboarding_completed_agents_visit columns - Add BrainRepoConfig model (user_id FK, encrypted token, repo metadata, sync state) - Add needs_onboarding() helper in models.py - SQLite migration block in app.py (ALTER TABLE pattern) - BRAIN_REPO_MASTER_KEY auto-generated in .env on first run - Register onboarding + brain_repo blueprints (try/except — files created in later commits) - Add /api/auth/needs-onboarding to PUBLIC_PATHS - Add watchdog>=4.0 to pyproject.toml - Create dashboard/backend/brain_repo/__init__.py - Create .handoffs/ with phase-0-handoff, domain-auth, domain-watcher
… endpoints - Updated auth_routes.py import to include BrainRepoConfig and needs_onboarding - Added GET /api/auth/needs-onboarding (public, handles unauthenticated + pre-setup) - Added POST /api/auth/mark-agents-visited (login_required) - Created routes/onboarding.py with state, start, complete, skip, provider endpoints
…ts, restore SSE, sync
- GET /api/brain-repo/status — returns config dict or {connected: false}
- POST /api/brain-repo/connect — validate PAT, create/validate repo, encrypt+store token
- POST /api/brain-repo/disconnect — clears token and disables sync
- GET /api/brain-repo/detect — find candidate brain repos for a PAT
- GET /api/brain-repo/snapshots — list daily/weekly/milestone/head refs
- POST /api/brain-repo/restore/start — SSE stream via stream_with_context
- POST /api/brain-repo/sync/force — commit+push+milestone tag
- POST /api/brain-repo/tag/milestone — create named milestone tag
All brain_repo sub-module calls use try/except ImportError for graceful fallback
…b_mirror, transcripts, restore, secrets_scanner
…acle banner, brain-repo settings
…salve skill + .gitignore - setup.py: add ask_password, ask_choice, _ensure_brain_master_key, _save_brain_repo_pat helpers; add Brain Repo wizard block in main() after choose_provider() (local mode); add brain_repo_* keys to all 3 language bundles (en-US, pt-BR, es) in MESSAGES dict - backup.py: add backup_to_github() function and 'github' --target option; dispatches to brain_repo.git_ops for commit+push+tag - .claude/skills/salve/SKILL.md: new skill that flushes session to memory files and relies on Brain Repo file watcher for auto-sync - .gitignore: add memory/raw-transcripts/ - .handoffs/phase-4-cli-handoff.md: phase summary
Add brainRepo namespace with 14 keys to all three locale files: brain_repo_enable_prompt, brain_repo_auth_method, brain_repo_defer_to_web, brain_repo_pat_instructions, brain_repo_pat_prompt, brain_repo_pat_saved, brain_repo_pat_skipped, brain_repo_configure_later, brain_repo_connect, brain_repo_disconnect, brain_repo_sync_now, brain_repo_create_milestone, brain_repo_status_connected, brain_repo_status_disconnected. Key parity verified: all diffs between bundles are empty sets.
There was a problem hiding this comment.
Sorry @NeritonDias, your pull request is larger than the review limit of 150000 diff characters
6017761 to
e2b0dd4
Compare
Code review — PR #32Resumo da mudança: três features interligadas — (1) "Brain Repo" versionando 🔴 Bloqueadores
🟡 Sugestões
💭 Nits
Próximos passos
Obrigado pelo trabalho — a arquitetura geral (PAT→OAuth abstrata, secrets scanner gate antes do swap no restore, manifest versionado) está bem pensada. Os dois 🔴 são corrigíveis e o resto é polimento. |
…bootstrap remote repo - routes/brain_repo.py: replace broken PATAuthProvider(key).encrypt(token) calls with PATAuthProvider(token, key).encrypt_token(); decrypt now uses module-level decrypt_token() helper. Handle missing BRAIN_REPO_MASTER_KEY gracefully. - Add new POST /api/brain-repo/validate-token endpoint that validates a PAT via GitHub /user without persisting anything (used by onboarding to check token before showing the repo-choice step). - Convert blueprint abort() calls to JSON via errorhandler so the SPA can show the actual backend message instead of a generic '400 BAD REQUEST'. - Add friendlier error in connect() when the GitHub repo already exists (422 → bilingual message guiding the user to 'Use existing' or pick another name). - New helper _initialize_remote_brain_repo: after create_private_repo creates an empty repo on GitHub, clone-init it locally, drop in the brain-repo skeleton via manifest.initialize_brain_repo (.evo-brain marker, manifest.yaml, dirs, README, .gitignore), commit and push to origin/main. Without this the repo stays empty and detect_brain_repos (code search by .evo-brain) cannot find it later — UI rejected such repos as 'incompatible'. - brain_repo/github_api.py: add get_repo_info(token, repo_url) that parses a GitHub URL, fetches the repo, and returns (is_private, info_dict). Replaces the broken validate_repo_is_private(token, repo_url) call site that expected a tuple but the underlying function returned bool and took 3 args. - brain_repo/pat_auth.py: re-export shim so existing routes/brain_repo.py imports of brain_repo.pat_auth.PATAuthProvider keep working (the actual implementation lives in github_oauth.py).
…gration
- routes/onboarding.py: rewrite set_provider so it never writes the legacy
{<id>: {api_key, enabled}} shape that broke routes/providers.py. Now reuses
_read_config / _write_config / ALLOWED_ENV_VARS from routes.providers and
produces the canonical {active_provider, providers: {<id>: {cli_command,
env_vars, ...}}} schema. Anthropic just flips active_provider (no key);
OpenAI/OpenRouter merge filtered env_vars; codex / codex_auth is a no-op
that defers to the existing /api/providers/openai OAuth endpoints.
- Env vars outside ALLOWED_ENV_VARS or containing shell metacharacters are
silently dropped — same policy as routes/providers.py._sanitize_env_vars.
- app.py: add boot-time providers.json schema-normalization migration.
If the file exists but is missing active_provider or providers keys, the
app overwrites it from providers.example.json. Recovers from the broken
state left by older versions of the onboarding wizard without requiring
the user to manually reset volumes.
The api helper used to throw 'new Error("400 BAD REQUEST")' for any non-OK
response, dropping the actual server message — so the UI showed a useless
generic status instead of the specific reason ('repo already exists',
'token required', etc).
New buildError() reads the response body, prefers JSON {error|description|
message} fields, falls back to short text, and prefixes with the status so
existing callers that pattern-match on '401' / '403' keep working.
… redirect loop Two bugs were producing the same symptom — wizard kept reloading instead of landing on /agents after completion: 1. App.tsx guard ignored onboarding_state === null. New users have NULL by default in the column, so they bypassed the guard and reached /agents without ever seeing the wizard. Guard now treats null as 'needs onboarding' (only completed/skipped count as done). 2. OnboardingRouter handleComplete/handleSkip called POST /onboarding/complete then navigated to /agents — but the React user state still had onboarding_state='pending' cached from /auth/me. The guard would re-read that cached value and bounce back to /onboarding, where the useEffect saw the fresh 'completed' state from the backend and tried to navigate again. Infinite loop. Fix: await refreshUser() (from AuthContext) before navigate so the guard sees the current state.
…d of /connect StepProvider was a single generic API-key form for all 4 providers — but Anthropic, OpenAI, OpenRouter and Codex each need a different flow: - Anthropic: no API key, user logs in by running 'claude' in the terminal at the Agents page. Sub-step shows that instruction + flips active_provider. - OpenAI: API key input → POST /providers/openai/models validates and lists available models → dropdown to pick model → save (env_vars merge). - OpenRouter: same shape as OpenAI but with read-only Base URL chip (https://openrouter.ai/api/v1) and free-text model field. - Codex: two tabs (Browser OAuth, Device Auth) replicating the Providers.tsx modal — calls /providers/openai/auth-start|auth-complete or device-start| device-poll. Tab/button colors aligned to the onboarding palette (#00FFA7). StepBrainConnect.tsx and RestoreSelectRepo.tsx now call POST /api/brain-repo/validate-token (validates without persisting) instead of /connect (which requires repo_url|create_repo and was 400-ing on token-only calls). RestoreSelectRepo also passes the token to /detect as a query param so the backend doesn't need a stored config to enumerate brain repos. i18n keys for these screens added in the same pass since the components were rewritten.
…keys The whole onboarding wizard (7 first-time screens + 5 restore screens) and the OracleWelcomeBanner were hardcoded English. Now wired through useTranslation with new namespaces: - onboarding.welcome / .stepIndicator / .provider / .brainRepo / .connect / .choose / .confirm / .providerLabels / .providerAnthropic / .providerOpenAI / .providerOpenRouter / .providerCodex / .backToProviders - restore.selectRepo / .selectSnapshot / .confirm / .execute - agents.welcomeBanner All 3 bundles (en-US, pt-BR, es) have the identical key tree with natural translations — no English strings reused as fallbacks in pt-BR / es.
…n scripts - Install @anthropic-ai/claude-code and @gitlawb/openclaude@latest as a global npm packages (matches Dockerfile.swarm.dashboard). Without these, the embedded terminal at /agents and the /providers page report 'CLI missing'. - Run sed -i 's/\r//' on entrypoint.sh and start-dashboard.sh after COPY so Windows CRLF line endings don't crash the container with 'env: bash\r: No such file or directory' on first boot. - Update header comment to reflect that the dev image now installs both CLIs.
…tract check ## Bug (flagged in PR evolution-foundation#32 review by @davidsoncelestino) routes/brain_repo.py:restore_start called execute_restore with keyword args that don't match the real signature in brain_repo/restore.py. The call passed `local_path` (not a parameter of that function) and omitted two required args — `install_dir` and `kb_key_matches`. A signature mismatch on Python keyword-only calls raises TypeError before the generator emits any events. Inside the SSE `generate()` it was caught by the generic `except Exception` branch and turned into a vague error event, so the feature was 100% broken but the failure mode looked like a transient restore error rather than an API mismatch. ## Fix - Call execute_restore with the actual signature: (repo_url, ref, token, install_dir, include_kb, kb_key_matches). - install_dir is resolved to WORKSPACE (project root) — that's where the SWAP_DIRS (memory/workspace/customizations/config-safe) are replaced. It is NOT the brain-repo clone path, which was the old confusion. - kb_key_matches is a new body parameter (default False = safe). The upcoming frontend restore modal will expose it as a checkbox "I still have the original master key". When False, KB import silently degrades to metadata-only inside execute_restore (existing behaviour). ## Prevention Added brain_repo/__init__.py import-time check that runs inspect.signature on PATAuthProvider.__init__ and execute_restore, logs a WARNING if they drift. Next time routes and brain_repo modules go out of sync the mismatch appears in startup logs instead of silently falling into an `except ImportError` that stored PAT tokens as plaintext (the exact bypass the reviewer flagged separately).
…lar import
The watcher previously did `from app import app` inside its own startup
function, which fires during app.py module load — SQLAlchemy's init_app
hadn't completed yet, so the half-initialised db instance was the "wrong"
one from the watcher's perspective. Every boot logged:
start_brain_watcher failed: The current Flask app is not registered
with this 'SQLAlchemy' instance
and auto-sync stayed permanently off. Users had to hit Sync manually.
Fix:
- start_brain_watcher now accepts `flask_app` as a parameter. app.py
passes the already-fully-initialised `app` instance when it calls it.
- Legacy fallback preserved: if nobody passes flask_app, it still tries
the old import (logs error, returns None) so out-of-tree callers keep
working.
- Bonus: watcher's auto-sync closure now mirrors workspace → brain_repo
before commit (using _sync_workspace_to_brain_repo from routes). Before,
auto-sync would commit an empty brain_repo_dir because nobody copied
the workspace into it — same root cause as the manual Sync bug fixed
earlier but silently swallowed by the daemon.
Fixes the repeating console noise "WebSocket connection to
'ws://localhost:32352/ws' failed" that some users see on pages that
don't use the terminal at all (e.g. /backups).
Root cause: useGlobalNotifications is mounted once in the app shell and
opens a WS to the terminal-server for agent-finished notifications. When
:32352 is unreachable the ws.onerror → ws.close → ws.onclose chain
triggered reconnect with exponential backoff capped at 30s — but every
failed attempt wrote an error line to the console, so users saw the
message every 1-30s forever.
Two fixes:
1. HTTP health-probe at TS_HTTP/api/health before opening the WS. If it
fails, back off to 30s and log ONCE ("terminal-server unreachable —
notifications disabled"). Skips the WS entirely so the browser never
emits the native "WebSocket connection failed" error.
2. Visibility guard: when the tab is hidden, stop trying. Resume as soon
as the user focuses the tab. Saves CPU + removes noise while multi-
tabbing.
Notifications behaviour is unchanged when the server is up — the first
health check passes, the WS opens as before.
… SSE restore modal Replaces the stacked "Local list + S3 list below" layout with three explicit tabs: Local / S3 / Brain Repo. Each tab has its own count badge, empty state, refresh button and table. The Brain Repo tab groups snapshots into 4 sections (HEAD, manual milestones, weekly, daily) — weekly/daily collapsed by default because they can hit 30+ entries each. Import button is now a dropdown with 2 sources: - Upload .zip from disk (existing behaviour) - Pull snapshot from Brain Repo (new — switches to the Brain tab) Brain Repo snapshots get their own restore modal, distinct from the Local/S3 merge|replace flow because semantics differ: - Warning upfront that brain restore does NOT recover the DB or files outside memory/workspace/customizations/config-safe. - "Include kb-mirror/" checkbox (off by default). - "I still have the original master key" checkbox, only visible when include_kb is on. Maps to the kb_key_matches body param from the earlier backend fix — defaults OFF so a wrong key can't decrypt-poison the KB. - SSE progress bar (~10 steps) via POST /api/brain-repo/restore/start. All text wired through useTranslation — ~40 new keys across en-US, pt-BR and es under backups.tabs, .s3Tab, .brainTab, .brainSnapshots, .importMenu, .brainRestore, plus viewOnGithub action. Lazy-fetches /api/brain-repo/snapshots only when the Brain tab is opened AND the brain repo is configured (avoids hitting the GitHub API needlessly on every /backups visit).
…olling When the watcher auto-sync fails (token expired, remote gone, network flap) backend writes the reason to BrainRepoConfig.last_error — but the UI used to ignore it silently. Users had no signal that sync was broken except noticing commits not appearing on GitHub. Now the Brain Repo destination card: - Switches border to danger red when last_error is set. - Replaces the "connected" pill with a "sync error" pill. - Shows the error message verbatim in a tight banner below the repo URL. - Adds a "Reconnect" pill button that routes back through /onboarding (clears the stored config + walks through PAT + repo selection again). Polling: a 30s setInterval on /api/backups/config while the user is on /backups AND brain is connected. Gated by document.visibilityState so it pauses in background tabs (saves CPU + GitHub rate limit). Lets the user see last_error clear within 30s after a successful manual Sync without requiring a full page refresh. i18n: +2 keys (syncError, reconnect) across en-US, pt-BR, es.
|
@davidsoncelestino valeu pela review profunda — os dois bloqueadores foram endereçados. Deixo o estado consolidado: B1 — PAT plaintext: resolvido em commits anteriores (antes desta review chegar)
Confirmei os contratos in-container: Commit relevante: B2 — restore SSE crasha: corrigido em
|
Follow-up review — PR #32@NeritonDias verifiquei os dois fixes no código atual ( B2 — restore SSE kwargs: resolvido ✅
B1 — PAT plaintext: caminho feliz resolvido, fallback inseguro permanece 🔴O happy path agora cripta corretamente via
Fix: trocar os fallbacks por 🟡 Nota — multi-worker watcher
Operacional
Resumindo: B2 ok, B1 parcial (fallback inseguro ainda precisa cair). Fora isso, só a nota do watcher. |
Removes the three silent plaintext fallbacks flagged in PR review — the
exact vector of the original bug.
## connect() — fail loud on write
Previously: master_key missing OR cryptography ImportError fell through
to ``encrypted = token.encode("utf-8")`` with only a log.warning. The
PAT was then written to BrainRepoConfig.github_token_encrypted as
plaintext, the UI said "connected ✓", and sync continued working. Only
way to notice was auditing the database.
Now: both failure modes return 500 with a typed response
``{"error": "...", "code": "CRYPTO_UNAVAILABLE"}`` and escalate to
log.critical. The ``code`` field lets lib/api.ts differentiate this
error from other 500s, so the UI can render a specific actionable
message instead of generic "something went wrong".
Rationale: nobody reads WARNING-level logs in prod. The app.py
bootstrap that auto-generates BRAIN_REPO_MASTER_KEY can fail silently
in several plausible cases (read-only .env mount, cryptography module
partially installed, env var set to empty string by a shell config,
gunicorn worker race during first boot). In any of those, the user
used to store a plaintext credential without knowing. Fail-closed is
the correct security default.
## _decrypt_token() — fail safe on read
Previously: master_key missing OR ImportError fell through to
``config.github_token_encrypted.decode("utf-8")`` — returning the
stored blob verbatim as if it were the plaintext token. Worked only
because the encrypt side had the mirror bug (so blob == plaintext).
Now: every failure path logs at ERROR level and returns "". Every
current caller already treats empty token as hard failure
(``if not token: abort(500, "Could not decrypt stored token")``), so
the behaviour user-visible stays correct and surfaces the issue
instead of pretending the decrypt succeeded.
Response to PR review (davidsoncelestino):
> log.warning não é notado por ninguém em produção
Exactly — everything on this path now logs CRITICAL or ERROR and
refuses the operation instead of accepting plaintext. The stored blob
is never decoded as a literal token.
start_brain_watcher used to capture master_key in the debounce closure without checking whether it was actually available. If BRAIN_REPO_MASTER_KEY was missing (the bootstrap failed silently), decrypt_token would crash on every single file-change event, logging the same error every 30s forever while the UI kept showing "auto-sync enabled ✓". Now the watcher refuses to start when master_key is empty, with a CRITICAL log message. The brain repo config stays connected, manual sync/tag still work (they abort with "Could not decrypt stored token" which is visible to the user), but the watcher doesn't spam-fail in the background. Notable: _initialize_remote_brain_repo does NOT need the same guard — it only needs the token argument (never encrypts/decrypts), and the upstream connect() endpoint now fails loud BEFORE calling it when crypto is broken.
…import-time CRITICAL check
Adds visibility so the user (and ops) notice when crypto is broken instead
of only finding out after attempting a sync that fails for mysterious
reasons.
- brain_repo/__init__.py:
- New _verify_crypto_ready() import-time check. Logs CRITICAL (not
WARNING) when the cryptography module is missing, BRAIN_REPO_MASTER_KEY
is unset, or the key format is invalid. Runs once at process start and
shows up red in production log aggregators — directly addresses review
feedback: 'log.warning não é notado por ninguém em produção'.
- New is_crypto_ready() runtime helper exposed for HTTP endpoints.
- routes/brain_repo.py:status: returns crypto_ready field in every shape
(connected + disconnected). UI uses it to render a danger-state card.
- routes/backups.py:backup_config: returns brain_crypto_ready so the
/backups destination card can render the same warning inline without
making a second round-trip to /brain-repo/status.
No UI changes yet — follow-up commit wires Backups.tsx and
settings/BrainRepo.tsx to these fields.
When the backend reports brain_crypto_ready=false / crypto_ready=false the UI now treats it as the most urgent state: - Backups destination card: danger border, "encryption broken" red pill (takes priority over "sync error"), inline explanation, Reconnect link. - Settings /brain-repo: prominent banner above the status card explaining the exact failure mode and that ALL sync/tag operations will fail until an admin restores BRAIN_REPO_MASTER_KEY. Why a separate state from last_error: cryptoBroken means stored tokens can't be decrypted at all, so every sync will fail the same way. last_error is transient (network hiccup, force-push conflict, etc). Users need to distinguish "transient — retry" from "server-side, contact admin". i18n: +6 keys across en-US / pt-BR / es (backups.destinations.cryptoBroken, cryptoBrokenDesc; brainRepoSettings.cryptoBroken.title, .desc). Closes the last piece of PR review feedback on silent plaintext fallbacks: - fallbacks removed (c43aa2f, 7049d1e) - CRITICAL logs instead of WARNING (3a12021) - User-visible warning so the failure is impossible to miss (this commit)
|
@DavidsonGomes concordo com a leitura — o fallback silencioso era exatamente o vetor original e não tinha razão pra persistir. Endereçado em 4 commits ( 1.
|
…uards against undefined ref
Clicking the Brain Repo tab turned the page black with
Cannot read properties of undefined (reading 'replace')
because list_snapshots returned HEAD as {"sha": "..."} with no `ref`
field, while daily/weekly/milestones items had both ref+sha. The
SnapshotRow renderer called s.ref.replace(/^refs\/tags\//, '') on every
item uniformly — HEAD crashed the whole component tree.
Two fixes, defence-in-depth:
1. Backend (brain_repo/github_api.py:list_snapshots): every snapshot
item now returns the same shape {ref, sha, label}. HEAD uses
ref="HEAD" / label="HEAD" (not an empty dict), and the caller sees
`head: null` when no HEAD exists (renamed from {} for clarity).
`label` is pre-computed by stripping `refs/tags/` so the frontend
has one string ready to render without parsing git refs.
2. Frontend (Backups.tsx): all .replace calls on snapshot.ref use
`(snapshot.ref ?? '')` so a future regression of the backend shape
can't re-black the page. The display string falls back to
"(unnamed)" when label + ref are both missing.
Validated end-to-end: /api/brain-repo/snapshots now returns
HEAD: {"ref": "HEAD", "sha": "...", "label": "HEAD"}
Milestones[0]: {"ref": "refs/tags/milestone/...", "sha": "...", "label": "milestone/..."}
…er update These changes were made locally in earlier turns but never committed — they were applied to the working tree after a stash pop during the upstream merge and slipped past subsequent `git add <specific-files>` commits. The callers in routes/brain_repo.py (already committed in c43aa2f and 7049d1e) DO use the new API: pushed, push_err = git_ops.push(repo_dir, token, with_tags=True) But git_ops.push on the branch still had the old signature: def push(repo_dir, token) -> bool So every sync/force and tag/milestone call in the pushed code would crash with TypeError: push() got an unexpected keyword argument 'with_tags'. Production on the branch was silently broken until now — smoke tests passed only because I was docker-cp'ing working-tree files into the container. Changes now in-tree: - push(repo_dir, token, with_tags=True) -> tuple[bool, str]. Returns (ok, error_message). Uses --follow-tags when with_tags=True so annotated tags created by create_tag actually reach GitHub. Masks token in stderr output before logging. - create_tag(repo_dir, tag, msg, force=False). When force=True, uses git tag -a -f so re-tagging an existing milestone moves the pointer instead of erroring. - sync_worker._execute_job uses the new tuple return + threads the real error into the pending-job JSON. All callers verified: sync_force, tag_milestone, _initialize_remote_brain_repo in routes/brain_repo.py already destructure the tuple correctly.
… steps Adds the EvoNexus logo above the step indicator on every inner wizard screen so the brand presence is consistent from Welcome through Step 3 (not just on the first screen). New component OnboardingHeader renders: [logo EVO_NEXUS.webp h-7] [Step N of 3 • ● ● ○] Used on StepProvider (step1of3), StepBrainRepo (step2of3), StepBrainConnect (step2aOf3), StepBrainChoose (step2bOf3), StepConfirm (step3of3). Each step file lost its duplicated step-indicator block — fewer lines overall. Welcome.tsx keeps its bespoke header with the larger logo and welcome title — it's the landing screen, different layout by design. These file changes were applied to the working tree during an earlier onboarding polish but never committed — they slipped through the subsequent `git add <specific-files>` batches.
The auto-sync closure (`_sync_fn` in start_brain_watcher) was calling git_ops.push() but discarding the (ok, error_message) tuple. Symptoms: 1. Push failures never reached BrainRepoConfig.last_error, so the UI card kept showing "connected ✓" while auto-sync quietly failed every 30 seconds in the background. The user only noticed when commits stopped appearing on GitHub. 2. Push successes didn't update last_sync, so the "Last sync: 2m ago" stayed frozen at the manual-sync timestamp even while the watcher was happily pushing every file change. Fix: capture the tuple, log ERROR on failure, and persist the result inside `flask_app.app_context()` (same Flask instance the start function received as a parameter, so the existing circular-import fix from f1ed7d8 keeps working). The 30s UI polling from 0e8235a now actually reflects watcher state. Found during post-merge audit of all `git_ops.push` call sites — every other caller was destructuring the tuple already; this one slipped.
…onboarding # Conflicts: # pyproject.toml
There was a problem hiding this comment.
Sorry @NeritonDias, your pull request is larger than the review limit of 150000 diff characters
…ore flow (#43) * feat(brain-repo): foundation — BrainRepoConfig model, user onboarding columns, BRAIN_REPO_MASTER_KEY, watchdog dep - Add User.onboarding_state + User.onboarding_completed_agents_visit columns - Add BrainRepoConfig model (user_id FK, encrypted token, repo metadata, sync state) - Add needs_onboarding() helper in models.py - SQLite migration block in app.py (ALTER TABLE pattern) - BRAIN_REPO_MASTER_KEY auto-generated in .env on first run - Register onboarding + brain_repo blueprints (try/except — files created in later commits) - Add /api/auth/needs-onboarding to PUBLIC_PATHS - Add watchdog>=4.0 to pyproject.toml - Create dashboard/backend/brain_repo/__init__.py - Create .handoffs/ with phase-0-handoff, domain-auth, domain-watcher * feat(onboarding): backend routes — needs-onboarding, onboarding state endpoints - Updated auth_routes.py import to include BrainRepoConfig and needs_onboarding - Added GET /api/auth/needs-onboarding (public, handles unauthenticated + pre-setup) - Added POST /api/auth/mark-agents-visited (login_required) - Created routes/onboarding.py with state, start, complete, skip, provider endpoints * feat(brain-repo): brain_repo route — status, connect, detect, snapshots, restore SSE, sync - GET /api/brain-repo/status — returns config dict or {connected: false} - POST /api/brain-repo/connect — validate PAT, create/validate repo, encrypt+store token - POST /api/brain-repo/disconnect — clears token and disables sync - GET /api/brain-repo/detect — find candidate brain repos for a PAT - GET /api/brain-repo/snapshots — list daily/weekly/milestone/head refs - POST /api/brain-repo/restore/start — SSE stream via stream_with_context - POST /api/brain-repo/sync/force — commit+push+milestone tag - POST /api/brain-repo/tag/milestone — create named milestone tag All brain_repo sub-module calls use try/except ImportError for graceful fallback * feat(brain-repo): github oauth, api, git_ops, watcher, sync_worker, kb_mirror, transcripts, restore, secrets_scanner * feat(onboarding): frontend wizard — first-time flow, restore flow, Oracle banner, brain-repo settings * chore(setup.py): brain-repo prompts in CLI + backup github target + /salve skill + .gitignore - setup.py: add ask_password, ask_choice, _ensure_brain_master_key, _save_brain_repo_pat helpers; add Brain Repo wizard block in main() after choose_provider() (local mode); add brain_repo_* keys to all 3 language bundles (en-US, pt-BR, es) in MESSAGES dict - backup.py: add backup_to_github() function and 'github' --target option; dispatches to brain_repo.git_ops for commit+push+tag - .claude/skills/salve/SKILL.md: new skill that flushes session to memory files and relies on Brain Repo file watcher for auto-sync - .gitignore: add memory/raw-transcripts/ - .handoffs/phase-4-cli-handoff.md: phase summary * i18n: 13 new brain-repo keys — en-US / pt-BR / es Add brainRepo namespace with 14 keys to all three locale files: brain_repo_enable_prompt, brain_repo_auth_method, brain_repo_defer_to_web, brain_repo_pat_instructions, brain_repo_pat_prompt, brain_repo_pat_saved, brain_repo_pat_skipped, brain_repo_configure_later, brain_repo_connect, brain_repo_disconnect, brain_repo_sync_now, brain_repo_create_milestone, brain_repo_status_connected, brain_repo_status_disconnected. Key parity verified: all diffs between bundles are empty sets. * chore: final audit handoff — all checks pass, 7 commits, build clean * chore: add docker-compose.dev.yml + DEV-SETUP.md for local Windows testing * fix(dev): Dockerfile.dev with --legacy-peer-deps to fix npm install on Windows Docker * fix(brain-repo): correct PATAuthProvider API + add /validate-token + bootstrap remote repo - routes/brain_repo.py: replace broken PATAuthProvider(key).encrypt(token) calls with PATAuthProvider(token, key).encrypt_token(); decrypt now uses module-level decrypt_token() helper. Handle missing BRAIN_REPO_MASTER_KEY gracefully. - Add new POST /api/brain-repo/validate-token endpoint that validates a PAT via GitHub /user without persisting anything (used by onboarding to check token before showing the repo-choice step). - Convert blueprint abort() calls to JSON via errorhandler so the SPA can show the actual backend message instead of a generic '400 BAD REQUEST'. - Add friendlier error in connect() when the GitHub repo already exists (422 → bilingual message guiding the user to 'Use existing' or pick another name). - New helper _initialize_remote_brain_repo: after create_private_repo creates an empty repo on GitHub, clone-init it locally, drop in the brain-repo skeleton via manifest.initialize_brain_repo (.evo-brain marker, manifest.yaml, dirs, README, .gitignore), commit and push to origin/main. Without this the repo stays empty and detect_brain_repos (code search by .evo-brain) cannot find it later — UI rejected such repos as 'incompatible'. - brain_repo/github_api.py: add get_repo_info(token, repo_url) that parses a GitHub URL, fetches the repo, and returns (is_private, info_dict). Replaces the broken validate_repo_is_private(token, repo_url) call site that expected a tuple but the underlying function returned bool and took 3 args. - brain_repo/pat_auth.py: re-export shim so existing routes/brain_repo.py imports of brain_repo.pat_auth.PATAuthProvider keep working (the actual implementation lives in github_oauth.py). * fix(onboarding): write canonical providers.json schema + boot-time migration - routes/onboarding.py: rewrite set_provider so it never writes the legacy {<id>: {api_key, enabled}} shape that broke routes/providers.py. Now reuses _read_config / _write_config / ALLOWED_ENV_VARS from routes.providers and produces the canonical {active_provider, providers: {<id>: {cli_command, env_vars, ...}}} schema. Anthropic just flips active_provider (no key); OpenAI/OpenRouter merge filtered env_vars; codex / codex_auth is a no-op that defers to the existing /api/providers/openai OAuth endpoints. - Env vars outside ALLOWED_ENV_VARS or containing shell metacharacters are silently dropped — same policy as routes/providers.py._sanitize_env_vars. - app.py: add boot-time providers.json schema-normalization migration. If the file exists but is missing active_provider or providers keys, the app overwrites it from providers.example.json. Recovers from the broken state left by older versions of the onboarding wizard without requiring the user to manually reset volumes. * fix(api): expose backend JSON error.description in lib/api.ts The api helper used to throw 'new Error("400 BAD REQUEST")' for any non-OK response, dropping the actual server message — so the UI showed a useless generic status instead of the specific reason ('repo already exists', 'token required', etc). New buildError() reads the response body, prefers JSON {error|description| message} fields, falls back to short text, and prefixes with the status so existing callers that pattern-match on '401' / '403' keep working. * fix(onboarding): refreshUser() before navigate to prevent /onboarding redirect loop Two bugs were producing the same symptom — wizard kept reloading instead of landing on /agents after completion: 1. App.tsx guard ignored onboarding_state === null. New users have NULL by default in the column, so they bypassed the guard and reached /agents without ever seeing the wizard. Guard now treats null as 'needs onboarding' (only completed/skipped count as done). 2. OnboardingRouter handleComplete/handleSkip called POST /onboarding/complete then navigated to /agents — but the React user state still had onboarding_state='pending' cached from /auth/me. The guard would re-read that cached value and bounce back to /onboarding, where the useEffect saw the fresh 'completed' state from the backend and tried to navigate again. Infinite loop. Fix: await refreshUser() (from AuthContext) before navigate so the guard sees the current state. * feat(onboarding): per-provider sub-flows + use /validate-token instead of /connect StepProvider was a single generic API-key form for all 4 providers — but Anthropic, OpenAI, OpenRouter and Codex each need a different flow: - Anthropic: no API key, user logs in by running 'claude' in the terminal at the Agents page. Sub-step shows that instruction + flips active_provider. - OpenAI: API key input → POST /providers/openai/models validates and lists available models → dropdown to pick model → save (env_vars merge). - OpenRouter: same shape as OpenAI but with read-only Base URL chip (https://openrouter.ai/api/v1) and free-text model field. - Codex: two tabs (Browser OAuth, Device Auth) replicating the Providers.tsx modal — calls /providers/openai/auth-start|auth-complete or device-start| device-poll. Tab/button colors aligned to the onboarding palette (#00FFA7). StepBrainConnect.tsx and RestoreSelectRepo.tsx now call POST /api/brain-repo/validate-token (validates without persisting) instead of /connect (which requires repo_url|create_repo and was 400-ing on token-only calls). RestoreSelectRepo also passes the token to /detect as a query param so the backend doesn't need a stored config to enumerate brain repos. i18n keys for these screens added in the same pass since the components were rewritten. * i18n(onboarding): translate 12 wizard screens + Oracle banner — ~150 keys The whole onboarding wizard (7 first-time screens + 5 restore screens) and the OracleWelcomeBanner were hardcoded English. Now wired through useTranslation with new namespaces: - onboarding.welcome / .stepIndicator / .provider / .brainRepo / .connect / .choose / .confirm / .providerLabels / .providerAnthropic / .providerOpenAI / .providerOpenRouter / .providerCodex / .backToProviders - restore.selectRepo / .selectSnapshot / .confirm / .execute - agents.welcomeBanner All 3 bundles (en-US, pt-BR, es) have the identical key tree with natural translations — no English strings reused as fallbacks in pt-BR / es. * fix(dev): Dockerfile.dev installs claude-code/openclaude + fix CRLF in scripts - Install @anthropic-ai/claude-code and @gitlawb/openclaude@latest as a global npm packages (matches Dockerfile.swarm.dashboard). Without these, the embedded terminal at /agents and the /providers page report 'CLI missing'. - Run sed -i 's/\r//' on entrypoint.sh and start-dashboard.sh after COPY so Windows CRLF line endings don't crash the container with 'env: bash\r: No such file or directory' on first boot. - Update header comment to reflect that the dev image now installs both CLIs. * fix(brain-repo): correct execute_restore kwargs + import-time API contract check ## Bug (flagged in PR #32 review by @davidsoncelestino) routes/brain_repo.py:restore_start called execute_restore with keyword args that don't match the real signature in brain_repo/restore.py. The call passed `local_path` (not a parameter of that function) and omitted two required args — `install_dir` and `kb_key_matches`. A signature mismatch on Python keyword-only calls raises TypeError before the generator emits any events. Inside the SSE `generate()` it was caught by the generic `except Exception` branch and turned into a vague error event, so the feature was 100% broken but the failure mode looked like a transient restore error rather than an API mismatch. ## Fix - Call execute_restore with the actual signature: (repo_url, ref, token, install_dir, include_kb, kb_key_matches). - install_dir is resolved to WORKSPACE (project root) — that's where the SWAP_DIRS (memory/workspace/customizations/config-safe) are replaced. It is NOT the brain-repo clone path, which was the old confusion. - kb_key_matches is a new body parameter (default False = safe). The upcoming frontend restore modal will expose it as a checkbox "I still have the original master key". When False, KB import silently degrades to metadata-only inside execute_restore (existing behaviour). ## Prevention Added brain_repo/__init__.py import-time check that runs inspect.signature on PATAuthProvider.__init__ and execute_restore, logs a WARNING if they drift. Next time routes and brain_repo modules go out of sync the mismatch appears in startup logs instead of silently falling into an `except ImportError` that stored PAT tokens as plaintext (the exact bypass the reviewer flagged separately). * fix(brain-repo): start_brain_watcher accepts flask_app, no more circular import The watcher previously did `from app import app` inside its own startup function, which fires during app.py module load — SQLAlchemy's init_app hadn't completed yet, so the half-initialised db instance was the "wrong" one from the watcher's perspective. Every boot logged: start_brain_watcher failed: The current Flask app is not registered with this 'SQLAlchemy' instance and auto-sync stayed permanently off. Users had to hit Sync manually. Fix: - start_brain_watcher now accepts `flask_app` as a parameter. app.py passes the already-fully-initialised `app` instance when it calls it. - Legacy fallback preserved: if nobody passes flask_app, it still tries the old import (logs error, returns None) so out-of-tree callers keep working. - Bonus: watcher's auto-sync closure now mirrors workspace → brain_repo before commit (using _sync_workspace_to_brain_repo from routes). Before, auto-sync would commit an empty brain_repo_dir because nobody copied the workspace into it — same root cause as the manual Sync bug fixed earlier but silently swallowed by the daemon. * chore(notifications): health-check before WS + visibility guard Fixes the repeating console noise "WebSocket connection to 'ws://localhost:32352/ws' failed" that some users see on pages that don't use the terminal at all (e.g. /backups). Root cause: useGlobalNotifications is mounted once in the app shell and opens a WS to the terminal-server for agent-finished notifications. When :32352 is unreachable the ws.onerror → ws.close → ws.onclose chain triggered reconnect with exponential backoff capped at 30s — but every failed attempt wrote an error line to the console, so users saw the message every 1-30s forever. Two fixes: 1. HTTP health-probe at TS_HTTP/api/health before opening the WS. If it fails, back off to 30s and log ONCE ("terminal-server unreachable — notifications disabled"). Skips the WS entirely so the browser never emits the native "WebSocket connection failed" error. 2. Visibility guard: when the tab is hidden, stop trying. Resume as soon as the user focuses the tab. Saves CPU + removes noise while multi- tabbing. Notifications behaviour is unchanged when the server is up — the first health check passes, the WS opens as before. * feat(backups): unified tabs (local/s3/brain-repo) + import dropdown + SSE restore modal Replaces the stacked "Local list + S3 list below" layout with three explicit tabs: Local / S3 / Brain Repo. Each tab has its own count badge, empty state, refresh button and table. The Brain Repo tab groups snapshots into 4 sections (HEAD, manual milestones, weekly, daily) — weekly/daily collapsed by default because they can hit 30+ entries each. Import button is now a dropdown with 2 sources: - Upload .zip from disk (existing behaviour) - Pull snapshot from Brain Repo (new — switches to the Brain tab) Brain Repo snapshots get their own restore modal, distinct from the Local/S3 merge|replace flow because semantics differ: - Warning upfront that brain restore does NOT recover the DB or files outside memory/workspace/customizations/config-safe. - "Include kb-mirror/" checkbox (off by default). - "I still have the original master key" checkbox, only visible when include_kb is on. Maps to the kb_key_matches body param from the earlier backend fix — defaults OFF so a wrong key can't decrypt-poison the KB. - SSE progress bar (~10 steps) via POST /api/brain-repo/restore/start. All text wired through useTranslation — ~40 new keys across en-US, pt-BR and es under backups.tabs, .s3Tab, .brainTab, .brainSnapshots, .importMenu, .brainRestore, plus viewOnGithub action. Lazy-fetches /api/brain-repo/snapshots only when the Brain tab is opened AND the brain repo is configured (avoids hitting the GitHub API needlessly on every /backups visit). * feat(backups): surface brain-repo last_error + 30s visibility-aware polling When the watcher auto-sync fails (token expired, remote gone, network flap) backend writes the reason to BrainRepoConfig.last_error — but the UI used to ignore it silently. Users had no signal that sync was broken except noticing commits not appearing on GitHub. Now the Brain Repo destination card: - Switches border to danger red when last_error is set. - Replaces the "connected" pill with a "sync error" pill. - Shows the error message verbatim in a tight banner below the repo URL. - Adds a "Reconnect" pill button that routes back through /onboarding (clears the stored config + walks through PAT + repo selection again). Polling: a 30s setInterval on /api/backups/config while the user is on /backups AND brain is connected. Gated by document.visibilityState so it pauses in background tabs (saves CPU + GitHub rate limit). Lets the user see last_error clear within 30s after a successful manual Sync without requiring a full page refresh. i18n: +2 keys (syncError, reconnect) across en-US, pt-BR, es. * fix(brain-repo): fail loud on missing crypto — no plaintext PAT fallback Removes the three silent plaintext fallbacks flagged in PR review — the exact vector of the original bug. ## connect() — fail loud on write Previously: master_key missing OR cryptography ImportError fell through to ``encrypted = token.encode("utf-8")`` with only a log.warning. The PAT was then written to BrainRepoConfig.github_token_encrypted as plaintext, the UI said "connected ✓", and sync continued working. Only way to notice was auditing the database. Now: both failure modes return 500 with a typed response ``{"error": "...", "code": "CRYPTO_UNAVAILABLE"}`` and escalate to log.critical. The ``code`` field lets lib/api.ts differentiate this error from other 500s, so the UI can render a specific actionable message instead of generic "something went wrong". Rationale: nobody reads WARNING-level logs in prod. The app.py bootstrap that auto-generates BRAIN_REPO_MASTER_KEY can fail silently in several plausible cases (read-only .env mount, cryptography module partially installed, env var set to empty string by a shell config, gunicorn worker race during first boot). In any of those, the user used to store a plaintext credential without knowing. Fail-closed is the correct security default. ## _decrypt_token() — fail safe on read Previously: master_key missing OR ImportError fell through to ``config.github_token_encrypted.decode("utf-8")`` — returning the stored blob verbatim as if it were the plaintext token. Worked only because the encrypt side had the mirror bug (so blob == plaintext). Now: every failure path logs at ERROR level and returns "". Every current caller already treats empty token as hard failure (``if not token: abort(500, "Could not decrypt stored token")``), so the behaviour user-visible stays correct and surfaces the issue instead of pretending the decrypt succeeded. Response to PR review (davidsoncelestino): > log.warning não é notado por ninguém em produção Exactly — everything on this path now logs CRITICAL or ERROR and refuses the operation instead of accepting plaintext. The stored blob is never decoded as a literal token. * fix(brain-repo): guard master_key in watcher startup — fail loud start_brain_watcher used to capture master_key in the debounce closure without checking whether it was actually available. If BRAIN_REPO_MASTER_KEY was missing (the bootstrap failed silently), decrypt_token would crash on every single file-change event, logging the same error every 30s forever while the UI kept showing "auto-sync enabled ✓". Now the watcher refuses to start when master_key is empty, with a CRITICAL log message. The brain repo config stays connected, manual sync/tag still work (they abort with "Could not decrypt stored token" which is visible to the user), but the watcher doesn't spam-fail in the background. Notable: _initialize_remote_brain_repo does NOT need the same guard — it only needs the token argument (never encrypts/decrypts), and the upstream connect() endpoint now fails loud BEFORE calling it when crypto is broken. * feat(brain-repo): expose crypto_ready in /status + /backups/config + import-time CRITICAL check Adds visibility so the user (and ops) notice when crypto is broken instead of only finding out after attempting a sync that fails for mysterious reasons. - brain_repo/__init__.py: - New _verify_crypto_ready() import-time check. Logs CRITICAL (not WARNING) when the cryptography module is missing, BRAIN_REPO_MASTER_KEY is unset, or the key format is invalid. Runs once at process start and shows up red in production log aggregators — directly addresses review feedback: 'log.warning não é notado por ninguém em produção'. - New is_crypto_ready() runtime helper exposed for HTTP endpoints. - routes/brain_repo.py:status: returns crypto_ready field in every shape (connected + disconnected). UI uses it to render a danger-state card. - routes/backups.py:backup_config: returns brain_crypto_ready so the /backups destination card can render the same warning inline without making a second round-trip to /brain-repo/status. No UI changes yet — follow-up commit wires Backups.tsx and settings/BrainRepo.tsx to these fields. * feat(backups,settings): surface crypto_ready — danger state + Reconnect When the backend reports brain_crypto_ready=false / crypto_ready=false the UI now treats it as the most urgent state: - Backups destination card: danger border, "encryption broken" red pill (takes priority over "sync error"), inline explanation, Reconnect link. - Settings /brain-repo: prominent banner above the status card explaining the exact failure mode and that ALL sync/tag operations will fail until an admin restores BRAIN_REPO_MASTER_KEY. Why a separate state from last_error: cryptoBroken means stored tokens can't be decrypted at all, so every sync will fail the same way. last_error is transient (network hiccup, force-push conflict, etc). Users need to distinguish "transient — retry" from "server-side, contact admin". i18n: +6 keys across en-US / pt-BR / es (backups.destinations.cryptoBroken, cryptoBrokenDesc; brainRepoSettings.cryptoBroken.title, .desc). Closes the last piece of PR review feedback on silent plaintext fallbacks: - fallbacks removed (c43aa2f, 7049d1e) - CRITICAL logs instead of WARNING (3a12021) - User-visible warning so the failure is impossible to miss (this commit) * fix(backups): /brain-repo/snapshots returns uniform shape; frontend guards against undefined ref Clicking the Brain Repo tab turned the page black with Cannot read properties of undefined (reading 'replace') because list_snapshots returned HEAD as {"sha": "..."} with no `ref` field, while daily/weekly/milestones items had both ref+sha. The SnapshotRow renderer called s.ref.replace(/^refs\/tags\//, '') on every item uniformly — HEAD crashed the whole component tree. Two fixes, defence-in-depth: 1. Backend (brain_repo/github_api.py:list_snapshots): every snapshot item now returns the same shape {ref, sha, label}. HEAD uses ref="HEAD" / label="HEAD" (not an empty dict), and the caller sees `head: null` when no HEAD exists (renamed from {} for clarity). `label` is pre-computed by stripping `refs/tags/` so the frontend has one string ready to render without parsing git refs. 2. Frontend (Backups.tsx): all .replace calls on snapshot.ref use `(snapshot.ref ?? '')` so a future regression of the backend shape can't re-black the page. The display string falls back to "(unnamed)" when label + ref are both missing. Validated end-to-end: /api/brain-repo/snapshots now returns HEAD: {"ref": "HEAD", "sha": "...", "label": "HEAD"} Milestones[0]: {"ref": "refs/tags/milestone/...", "sha": "...", "label": "milestone/..."} * fix(brain-repo): commit the git_ops.push tuple API + sync_worker caller update These changes were made locally in earlier turns but never committed — they were applied to the working tree after a stash pop during the upstream merge and slipped past subsequent `git add <specific-files>` commits. The callers in routes/brain_repo.py (already committed in c43aa2f and 7049d1e) DO use the new API: pushed, push_err = git_ops.push(repo_dir, token, with_tags=True) But git_ops.push on the branch still had the old signature: def push(repo_dir, token) -> bool So every sync/force and tag/milestone call in the pushed code would crash with TypeError: push() got an unexpected keyword argument 'with_tags'. Production on the branch was silently broken until now — smoke tests passed only because I was docker-cp'ing working-tree files into the container. Changes now in-tree: - push(repo_dir, token, with_tags=True) -> tuple[bool, str]. Returns (ok, error_message). Uses --follow-tags when with_tags=True so annotated tags created by create_tag actually reach GitHub. Masks token in stderr output before logging. - create_tag(repo_dir, tag, msg, force=False). When force=True, uses git tag -a -f so re-tagging an existing milestone moves the pointer instead of erroring. - sync_worker._execute_job uses the new tuple return + threads the real error into the pending-job JSON. All callers verified: sync_force, tag_milestone, _initialize_remote_brain_repo in routes/brain_repo.py already destructure the tuple correctly. * feat(onboarding): shared OnboardingHeader with EvoNexus logo on inner steps Adds the EvoNexus logo above the step indicator on every inner wizard screen so the brand presence is consistent from Welcome through Step 3 (not just on the first screen). New component OnboardingHeader renders: [logo EVO_NEXUS.webp h-7] [Step N of 3 • ● ● ○] Used on StepProvider (step1of3), StepBrainRepo (step2of3), StepBrainConnect (step2aOf3), StepBrainChoose (step2bOf3), StepConfirm (step3of3). Each step file lost its duplicated step-indicator block — fewer lines overall. Welcome.tsx keeps its bespoke header with the larger logo and welcome title — it's the landing screen, different layout by design. These file changes were applied to the working tree during an earlier onboarding polish but never committed — they slipped through the subsequent `git add <specific-files>` batches. * fix(brain-repo): watcher persists push result to last_error / last_sync The auto-sync closure (`_sync_fn` in start_brain_watcher) was calling git_ops.push() but discarding the (ok, error_message) tuple. Symptoms: 1. Push failures never reached BrainRepoConfig.last_error, so the UI card kept showing "connected ✓" while auto-sync quietly failed every 30 seconds in the background. The user only noticed when commits stopped appearing on GitHub. 2. Push successes didn't update last_sync, so the "Last sync: 2m ago" stayed frozen at the manual-sync timestamp even while the watcher was happily pushing every file change. Fix: capture the tuple, log ERROR on failure, and persist the result inside `flask_app.app_context()` (same Flask instance the start function received as a parameter, so the existing circular-import fix from f1ed7d8 keeps working). The 30s UI polling from 0e8235a now actually reflects watcher state. Found during post-merge audit of all `git_ops.push` call sites — every other caller was destructuring the tuple already; this one slipped.
…ore flow (evolution-foundation#43) * feat(brain-repo): foundation — BrainRepoConfig model, user onboarding columns, BRAIN_REPO_MASTER_KEY, watchdog dep - Add User.onboarding_state + User.onboarding_completed_agents_visit columns - Add BrainRepoConfig model (user_id FK, encrypted token, repo metadata, sync state) - Add needs_onboarding() helper in models.py - SQLite migration block in app.py (ALTER TABLE pattern) - BRAIN_REPO_MASTER_KEY auto-generated in .env on first run - Register onboarding + brain_repo blueprints (try/except — files created in later commits) - Add /api/auth/needs-onboarding to PUBLIC_PATHS - Add watchdog>=4.0 to pyproject.toml - Create dashboard/backend/brain_repo/__init__.py - Create .handoffs/ with phase-0-handoff, domain-auth, domain-watcher * feat(onboarding): backend routes — needs-onboarding, onboarding state endpoints - Updated auth_routes.py import to include BrainRepoConfig and needs_onboarding - Added GET /api/auth/needs-onboarding (public, handles unauthenticated + pre-setup) - Added POST /api/auth/mark-agents-visited (login_required) - Created routes/onboarding.py with state, start, complete, skip, provider endpoints * feat(brain-repo): brain_repo route — status, connect, detect, snapshots, restore SSE, sync - GET /api/brain-repo/status — returns config dict or {connected: false} - POST /api/brain-repo/connect — validate PAT, create/validate repo, encrypt+store token - POST /api/brain-repo/disconnect — clears token and disables sync - GET /api/brain-repo/detect — find candidate brain repos for a PAT - GET /api/brain-repo/snapshots — list daily/weekly/milestone/head refs - POST /api/brain-repo/restore/start — SSE stream via stream_with_context - POST /api/brain-repo/sync/force — commit+push+milestone tag - POST /api/brain-repo/tag/milestone — create named milestone tag All brain_repo sub-module calls use try/except ImportError for graceful fallback * feat(brain-repo): github oauth, api, git_ops, watcher, sync_worker, kb_mirror, transcripts, restore, secrets_scanner * feat(onboarding): frontend wizard — first-time flow, restore flow, Oracle banner, brain-repo settings * chore(setup.py): brain-repo prompts in CLI + backup github target + /salve skill + .gitignore - setup.py: add ask_password, ask_choice, _ensure_brain_master_key, _save_brain_repo_pat helpers; add Brain Repo wizard block in main() after choose_provider() (local mode); add brain_repo_* keys to all 3 language bundles (en-US, pt-BR, es) in MESSAGES dict - backup.py: add backup_to_github() function and 'github' --target option; dispatches to brain_repo.git_ops for commit+push+tag - .claude/skills/salve/SKILL.md: new skill that flushes session to memory files and relies on Brain Repo file watcher for auto-sync - .gitignore: add memory/raw-transcripts/ - .handoffs/phase-4-cli-handoff.md: phase summary * i18n: 13 new brain-repo keys — en-US / pt-BR / es Add brainRepo namespace with 14 keys to all three locale files: brain_repo_enable_prompt, brain_repo_auth_method, brain_repo_defer_to_web, brain_repo_pat_instructions, brain_repo_pat_prompt, brain_repo_pat_saved, brain_repo_pat_skipped, brain_repo_configure_later, brain_repo_connect, brain_repo_disconnect, brain_repo_sync_now, brain_repo_create_milestone, brain_repo_status_connected, brain_repo_status_disconnected. Key parity verified: all diffs between bundles are empty sets. * chore: final audit handoff — all checks pass, 7 commits, build clean * chore: add docker-compose.dev.yml + DEV-SETUP.md for local Windows testing * fix(dev): Dockerfile.dev with --legacy-peer-deps to fix npm install on Windows Docker * fix(brain-repo): correct PATAuthProvider API + add /validate-token + bootstrap remote repo - routes/brain_repo.py: replace broken PATAuthProvider(key).encrypt(token) calls with PATAuthProvider(token, key).encrypt_token(); decrypt now uses module-level decrypt_token() helper. Handle missing BRAIN_REPO_MASTER_KEY gracefully. - Add new POST /api/brain-repo/validate-token endpoint that validates a PAT via GitHub /user without persisting anything (used by onboarding to check token before showing the repo-choice step). - Convert blueprint abort() calls to JSON via errorhandler so the SPA can show the actual backend message instead of a generic '400 BAD REQUEST'. - Add friendlier error in connect() when the GitHub repo already exists (422 → bilingual message guiding the user to 'Use existing' or pick another name). - New helper _initialize_remote_brain_repo: after create_private_repo creates an empty repo on GitHub, clone-init it locally, drop in the brain-repo skeleton via manifest.initialize_brain_repo (.evo-brain marker, manifest.yaml, dirs, README, .gitignore), commit and push to origin/main. Without this the repo stays empty and detect_brain_repos (code search by .evo-brain) cannot find it later — UI rejected such repos as 'incompatible'. - brain_repo/github_api.py: add get_repo_info(token, repo_url) that parses a GitHub URL, fetches the repo, and returns (is_private, info_dict). Replaces the broken validate_repo_is_private(token, repo_url) call site that expected a tuple but the underlying function returned bool and took 3 args. - brain_repo/pat_auth.py: re-export shim so existing routes/brain_repo.py imports of brain_repo.pat_auth.PATAuthProvider keep working (the actual implementation lives in github_oauth.py). * fix(onboarding): write canonical providers.json schema + boot-time migration - routes/onboarding.py: rewrite set_provider so it never writes the legacy {<id>: {api_key, enabled}} shape that broke routes/providers.py. Now reuses _read_config / _write_config / ALLOWED_ENV_VARS from routes.providers and produces the canonical {active_provider, providers: {<id>: {cli_command, env_vars, ...}}} schema. Anthropic just flips active_provider (no key); OpenAI/OpenRouter merge filtered env_vars; codex / codex_auth is a no-op that defers to the existing /api/providers/openai OAuth endpoints. - Env vars outside ALLOWED_ENV_VARS or containing shell metacharacters are silently dropped — same policy as routes/providers.py._sanitize_env_vars. - app.py: add boot-time providers.json schema-normalization migration. If the file exists but is missing active_provider or providers keys, the app overwrites it from providers.example.json. Recovers from the broken state left by older versions of the onboarding wizard without requiring the user to manually reset volumes. * fix(api): expose backend JSON error.description in lib/api.ts The api helper used to throw 'new Error("400 BAD REQUEST")' for any non-OK response, dropping the actual server message — so the UI showed a useless generic status instead of the specific reason ('repo already exists', 'token required', etc). New buildError() reads the response body, prefers JSON {error|description| message} fields, falls back to short text, and prefixes with the status so existing callers that pattern-match on '401' / '403' keep working. * fix(onboarding): refreshUser() before navigate to prevent /onboarding redirect loop Two bugs were producing the same symptom — wizard kept reloading instead of landing on /agents after completion: 1. App.tsx guard ignored onboarding_state === null. New users have NULL by default in the column, so they bypassed the guard and reached /agents without ever seeing the wizard. Guard now treats null as 'needs onboarding' (only completed/skipped count as done). 2. OnboardingRouter handleComplete/handleSkip called POST /onboarding/complete then navigated to /agents — but the React user state still had onboarding_state='pending' cached from /auth/me. The guard would re-read that cached value and bounce back to /onboarding, where the useEffect saw the fresh 'completed' state from the backend and tried to navigate again. Infinite loop. Fix: await refreshUser() (from AuthContext) before navigate so the guard sees the current state. * feat(onboarding): per-provider sub-flows + use /validate-token instead of /connect StepProvider was a single generic API-key form for all 4 providers — but Anthropic, OpenAI, OpenRouter and Codex each need a different flow: - Anthropic: no API key, user logs in by running 'claude' in the terminal at the Agents page. Sub-step shows that instruction + flips active_provider. - OpenAI: API key input → POST /providers/openai/models validates and lists available models → dropdown to pick model → save (env_vars merge). - OpenRouter: same shape as OpenAI but with read-only Base URL chip (https://openrouter.ai/api/v1) and free-text model field. - Codex: two tabs (Browser OAuth, Device Auth) replicating the Providers.tsx modal — calls /providers/openai/auth-start|auth-complete or device-start| device-poll. Tab/button colors aligned to the onboarding palette (#00FFA7). StepBrainConnect.tsx and RestoreSelectRepo.tsx now call POST /api/brain-repo/validate-token (validates without persisting) instead of /connect (which requires repo_url|create_repo and was 400-ing on token-only calls). RestoreSelectRepo also passes the token to /detect as a query param so the backend doesn't need a stored config to enumerate brain repos. i18n keys for these screens added in the same pass since the components were rewritten. * i18n(onboarding): translate 12 wizard screens + Oracle banner — ~150 keys The whole onboarding wizard (7 first-time screens + 5 restore screens) and the OracleWelcomeBanner were hardcoded English. Now wired through useTranslation with new namespaces: - onboarding.welcome / .stepIndicator / .provider / .brainRepo / .connect / .choose / .confirm / .providerLabels / .providerAnthropic / .providerOpenAI / .providerOpenRouter / .providerCodex / .backToProviders - restore.selectRepo / .selectSnapshot / .confirm / .execute - agents.welcomeBanner All 3 bundles (en-US, pt-BR, es) have the identical key tree with natural translations — no English strings reused as fallbacks in pt-BR / es. * fix(dev): Dockerfile.dev installs claude-code/openclaude + fix CRLF in scripts - Install @anthropic-ai/claude-code and @gitlawb/openclaude@latest as a global npm packages (matches Dockerfile.swarm.dashboard). Without these, the embedded terminal at /agents and the /providers page report 'CLI missing'. - Run sed -i 's/\r//' on entrypoint.sh and start-dashboard.sh after COPY so Windows CRLF line endings don't crash the container with 'env: bash\r: No such file or directory' on first boot. - Update header comment to reflect that the dev image now installs both CLIs. * fix(brain-repo): correct execute_restore kwargs + import-time API contract check ## Bug (flagged in PR evolution-foundation#32 review by @davidsoncelestino) routes/brain_repo.py:restore_start called execute_restore with keyword args that don't match the real signature in brain_repo/restore.py. The call passed `local_path` (not a parameter of that function) and omitted two required args — `install_dir` and `kb_key_matches`. A signature mismatch on Python keyword-only calls raises TypeError before the generator emits any events. Inside the SSE `generate()` it was caught by the generic `except Exception` branch and turned into a vague error event, so the feature was 100% broken but the failure mode looked like a transient restore error rather than an API mismatch. ## Fix - Call execute_restore with the actual signature: (repo_url, ref, token, install_dir, include_kb, kb_key_matches). - install_dir is resolved to WORKSPACE (project root) — that's where the SWAP_DIRS (memory/workspace/customizations/config-safe) are replaced. It is NOT the brain-repo clone path, which was the old confusion. - kb_key_matches is a new body parameter (default False = safe). The upcoming frontend restore modal will expose it as a checkbox "I still have the original master key". When False, KB import silently degrades to metadata-only inside execute_restore (existing behaviour). ## Prevention Added brain_repo/__init__.py import-time check that runs inspect.signature on PATAuthProvider.__init__ and execute_restore, logs a WARNING if they drift. Next time routes and brain_repo modules go out of sync the mismatch appears in startup logs instead of silently falling into an `except ImportError` that stored PAT tokens as plaintext (the exact bypass the reviewer flagged separately). * fix(brain-repo): start_brain_watcher accepts flask_app, no more circular import The watcher previously did `from app import app` inside its own startup function, which fires during app.py module load — SQLAlchemy's init_app hadn't completed yet, so the half-initialised db instance was the "wrong" one from the watcher's perspective. Every boot logged: start_brain_watcher failed: The current Flask app is not registered with this 'SQLAlchemy' instance and auto-sync stayed permanently off. Users had to hit Sync manually. Fix: - start_brain_watcher now accepts `flask_app` as a parameter. app.py passes the already-fully-initialised `app` instance when it calls it. - Legacy fallback preserved: if nobody passes flask_app, it still tries the old import (logs error, returns None) so out-of-tree callers keep working. - Bonus: watcher's auto-sync closure now mirrors workspace → brain_repo before commit (using _sync_workspace_to_brain_repo from routes). Before, auto-sync would commit an empty brain_repo_dir because nobody copied the workspace into it — same root cause as the manual Sync bug fixed earlier but silently swallowed by the daemon. * chore(notifications): health-check before WS + visibility guard Fixes the repeating console noise "WebSocket connection to 'ws://localhost:32352/ws' failed" that some users see on pages that don't use the terminal at all (e.g. /backups). Root cause: useGlobalNotifications is mounted once in the app shell and opens a WS to the terminal-server for agent-finished notifications. When :32352 is unreachable the ws.onerror → ws.close → ws.onclose chain triggered reconnect with exponential backoff capped at 30s — but every failed attempt wrote an error line to the console, so users saw the message every 1-30s forever. Two fixes: 1. HTTP health-probe at TS_HTTP/api/health before opening the WS. If it fails, back off to 30s and log ONCE ("terminal-server unreachable — notifications disabled"). Skips the WS entirely so the browser never emits the native "WebSocket connection failed" error. 2. Visibility guard: when the tab is hidden, stop trying. Resume as soon as the user focuses the tab. Saves CPU + removes noise while multi- tabbing. Notifications behaviour is unchanged when the server is up — the first health check passes, the WS opens as before. * feat(backups): unified tabs (local/s3/brain-repo) + import dropdown + SSE restore modal Replaces the stacked "Local list + S3 list below" layout with three explicit tabs: Local / S3 / Brain Repo. Each tab has its own count badge, empty state, refresh button and table. The Brain Repo tab groups snapshots into 4 sections (HEAD, manual milestones, weekly, daily) — weekly/daily collapsed by default because they can hit 30+ entries each. Import button is now a dropdown with 2 sources: - Upload .zip from disk (existing behaviour) - Pull snapshot from Brain Repo (new — switches to the Brain tab) Brain Repo snapshots get their own restore modal, distinct from the Local/S3 merge|replace flow because semantics differ: - Warning upfront that brain restore does NOT recover the DB or files outside memory/workspace/customizations/config-safe. - "Include kb-mirror/" checkbox (off by default). - "I still have the original master key" checkbox, only visible when include_kb is on. Maps to the kb_key_matches body param from the earlier backend fix — defaults OFF so a wrong key can't decrypt-poison the KB. - SSE progress bar (~10 steps) via POST /api/brain-repo/restore/start. All text wired through useTranslation — ~40 new keys across en-US, pt-BR and es under backups.tabs, .s3Tab, .brainTab, .brainSnapshots, .importMenu, .brainRestore, plus viewOnGithub action. Lazy-fetches /api/brain-repo/snapshots only when the Brain tab is opened AND the brain repo is configured (avoids hitting the GitHub API needlessly on every /backups visit). * feat(backups): surface brain-repo last_error + 30s visibility-aware polling When the watcher auto-sync fails (token expired, remote gone, network flap) backend writes the reason to BrainRepoConfig.last_error — but the UI used to ignore it silently. Users had no signal that sync was broken except noticing commits not appearing on GitHub. Now the Brain Repo destination card: - Switches border to danger red when last_error is set. - Replaces the "connected" pill with a "sync error" pill. - Shows the error message verbatim in a tight banner below the repo URL. - Adds a "Reconnect" pill button that routes back through /onboarding (clears the stored config + walks through PAT + repo selection again). Polling: a 30s setInterval on /api/backups/config while the user is on /backups AND brain is connected. Gated by document.visibilityState so it pauses in background tabs (saves CPU + GitHub rate limit). Lets the user see last_error clear within 30s after a successful manual Sync without requiring a full page refresh. i18n: +2 keys (syncError, reconnect) across en-US, pt-BR, es. * fix(brain-repo): fail loud on missing crypto — no plaintext PAT fallback Removes the three silent plaintext fallbacks flagged in PR review — the exact vector of the original bug. ## connect() — fail loud on write Previously: master_key missing OR cryptography ImportError fell through to ``encrypted = token.encode("utf-8")`` with only a log.warning. The PAT was then written to BrainRepoConfig.github_token_encrypted as plaintext, the UI said "connected ✓", and sync continued working. Only way to notice was auditing the database. Now: both failure modes return 500 with a typed response ``{"error": "...", "code": "CRYPTO_UNAVAILABLE"}`` and escalate to log.critical. The ``code`` field lets lib/api.ts differentiate this error from other 500s, so the UI can render a specific actionable message instead of generic "something went wrong". Rationale: nobody reads WARNING-level logs in prod. The app.py bootstrap that auto-generates BRAIN_REPO_MASTER_KEY can fail silently in several plausible cases (read-only .env mount, cryptography module partially installed, env var set to empty string by a shell config, gunicorn worker race during first boot). In any of those, the user used to store a plaintext credential without knowing. Fail-closed is the correct security default. ## _decrypt_token() — fail safe on read Previously: master_key missing OR ImportError fell through to ``config.github_token_encrypted.decode("utf-8")`` — returning the stored blob verbatim as if it were the plaintext token. Worked only because the encrypt side had the mirror bug (so blob == plaintext). Now: every failure path logs at ERROR level and returns "". Every current caller already treats empty token as hard failure (``if not token: abort(500, "Could not decrypt stored token")``), so the behaviour user-visible stays correct and surfaces the issue instead of pretending the decrypt succeeded. Response to PR review (davidsoncelestino): > log.warning não é notado por ninguém em produção Exactly — everything on this path now logs CRITICAL or ERROR and refuses the operation instead of accepting plaintext. The stored blob is never decoded as a literal token. * fix(brain-repo): guard master_key in watcher startup — fail loud start_brain_watcher used to capture master_key in the debounce closure without checking whether it was actually available. If BRAIN_REPO_MASTER_KEY was missing (the bootstrap failed silently), decrypt_token would crash on every single file-change event, logging the same error every 30s forever while the UI kept showing "auto-sync enabled ✓". Now the watcher refuses to start when master_key is empty, with a CRITICAL log message. The brain repo config stays connected, manual sync/tag still work (they abort with "Could not decrypt stored token" which is visible to the user), but the watcher doesn't spam-fail in the background. Notable: _initialize_remote_brain_repo does NOT need the same guard — it only needs the token argument (never encrypts/decrypts), and the upstream connect() endpoint now fails loud BEFORE calling it when crypto is broken. * feat(brain-repo): expose crypto_ready in /status + /backups/config + import-time CRITICAL check Adds visibility so the user (and ops) notice when crypto is broken instead of only finding out after attempting a sync that fails for mysterious reasons. - brain_repo/__init__.py: - New _verify_crypto_ready() import-time check. Logs CRITICAL (not WARNING) when the cryptography module is missing, BRAIN_REPO_MASTER_KEY is unset, or the key format is invalid. Runs once at process start and shows up red in production log aggregators — directly addresses review feedback: 'log.warning não é notado por ninguém em produção'. - New is_crypto_ready() runtime helper exposed for HTTP endpoints. - routes/brain_repo.py:status: returns crypto_ready field in every shape (connected + disconnected). UI uses it to render a danger-state card. - routes/backups.py:backup_config: returns brain_crypto_ready so the /backups destination card can render the same warning inline without making a second round-trip to /brain-repo/status. No UI changes yet — follow-up commit wires Backups.tsx and settings/BrainRepo.tsx to these fields. * feat(backups,settings): surface crypto_ready — danger state + Reconnect When the backend reports brain_crypto_ready=false / crypto_ready=false the UI now treats it as the most urgent state: - Backups destination card: danger border, "encryption broken" red pill (takes priority over "sync error"), inline explanation, Reconnect link. - Settings /brain-repo: prominent banner above the status card explaining the exact failure mode and that ALL sync/tag operations will fail until an admin restores BRAIN_REPO_MASTER_KEY. Why a separate state from last_error: cryptoBroken means stored tokens can't be decrypted at all, so every sync will fail the same way. last_error is transient (network hiccup, force-push conflict, etc). Users need to distinguish "transient — retry" from "server-side, contact admin". i18n: +6 keys across en-US / pt-BR / es (backups.destinations.cryptoBroken, cryptoBrokenDesc; brainRepoSettings.cryptoBroken.title, .desc). Closes the last piece of PR review feedback on silent plaintext fallbacks: - fallbacks removed (c43aa2f, 7049d1e) - CRITICAL logs instead of WARNING (3a12021) - User-visible warning so the failure is impossible to miss (this commit) * fix(backups): /brain-repo/snapshots returns uniform shape; frontend guards against undefined ref Clicking the Brain Repo tab turned the page black with Cannot read properties of undefined (reading 'replace') because list_snapshots returned HEAD as {"sha": "..."} with no `ref` field, while daily/weekly/milestones items had both ref+sha. The SnapshotRow renderer called s.ref.replace(/^refs\/tags\//, '') on every item uniformly — HEAD crashed the whole component tree. Two fixes, defence-in-depth: 1. Backend (brain_repo/github_api.py:list_snapshots): every snapshot item now returns the same shape {ref, sha, label}. HEAD uses ref="HEAD" / label="HEAD" (not an empty dict), and the caller sees `head: null` when no HEAD exists (renamed from {} for clarity). `label` is pre-computed by stripping `refs/tags/` so the frontend has one string ready to render without parsing git refs. 2. Frontend (Backups.tsx): all .replace calls on snapshot.ref use `(snapshot.ref ?? '')` so a future regression of the backend shape can't re-black the page. The display string falls back to "(unnamed)" when label + ref are both missing. Validated end-to-end: /api/brain-repo/snapshots now returns HEAD: {"ref": "HEAD", "sha": "...", "label": "HEAD"} Milestones[0]: {"ref": "refs/tags/milestone/...", "sha": "...", "label": "milestone/..."} * fix(brain-repo): commit the git_ops.push tuple API + sync_worker caller update These changes were made locally in earlier turns but never committed — they were applied to the working tree after a stash pop during the upstream merge and slipped past subsequent `git add <specific-files>` commits. The callers in routes/brain_repo.py (already committed in c43aa2f and 7049d1e) DO use the new API: pushed, push_err = git_ops.push(repo_dir, token, with_tags=True) But git_ops.push on the branch still had the old signature: def push(repo_dir, token) -> bool So every sync/force and tag/milestone call in the pushed code would crash with TypeError: push() got an unexpected keyword argument 'with_tags'. Production on the branch was silently broken until now — smoke tests passed only because I was docker-cp'ing working-tree files into the container. Changes now in-tree: - push(repo_dir, token, with_tags=True) -> tuple[bool, str]. Returns (ok, error_message). Uses --follow-tags when with_tags=True so annotated tags created by create_tag actually reach GitHub. Masks token in stderr output before logging. - create_tag(repo_dir, tag, msg, force=False). When force=True, uses git tag -a -f so re-tagging an existing milestone moves the pointer instead of erroring. - sync_worker._execute_job uses the new tuple return + threads the real error into the pending-job JSON. All callers verified: sync_force, tag_milestone, _initialize_remote_brain_repo in routes/brain_repo.py already destructure the tuple correctly. * feat(onboarding): shared OnboardingHeader with EvoNexus logo on inner steps Adds the EvoNexus logo above the step indicator on every inner wizard screen so the brand presence is consistent from Welcome through Step 3 (not just on the first screen). New component OnboardingHeader renders: [logo EVO_NEXUS.webp h-7] [Step N of 3 • ● ● ○] Used on StepProvider (step1of3), StepBrainRepo (step2of3), StepBrainConnect (step2aOf3), StepBrainChoose (step2bOf3), StepConfirm (step3of3). Each step file lost its duplicated step-indicator block — fewer lines overall. Welcome.tsx keeps its bespoke header with the larger logo and welcome title — it's the landing screen, different layout by design. These file changes were applied to the working tree during an earlier onboarding polish but never committed — they slipped through the subsequent `git add <specific-files>` batches. * fix(brain-repo): watcher persists push result to last_error / last_sync The auto-sync closure (`_sync_fn` in start_brain_watcher) was calling git_ops.push() but discarding the (ok, error_message) tuple. Symptoms: 1. Push failures never reached BrainRepoConfig.last_error, so the UI card kept showing "connected ✓" while auto-sync quietly failed every 30 seconds in the background. The user only noticed when commits stopped appearing on GitHub. 2. Push successes didn't update last_sync, so the "Last sync: 2m ago" stayed frozen at the manual-sync timestamp even while the watcher was happily pushing every file change. Fix: capture the tuple, log ERROR on failure, and persist the result inside `flask_app.app_context()` (same Flask instance the start function received as a parameter, so the existing circular-import fix from f1ed7d8 keeps working). The 30s UI polling from 0e8235a now actually reflects watcher state. Found during post-merge audit of all `git_ops.push` call sites — every other caller was destructuring the tuple already; this one slipped.
Summary
Three interrelated features that together close the gap between a fresh EvoNexus install and a fully versioned, recoverable workspace:
memory/,workspace/, customizations and config via a dedicated private repo (evo-brain-<username>). File watcher with 30s debounce persists every change to GitHub. Daily/weekly/milestone snapshots can be restored via a streaming SSE engine./backupspage — single surface for Local ZIPs, S3 and Brain Repo with tabs, import dropdown, contextual restore modal, danger-state when crypto is broken.31 commits, 64 files changed, +9805 / -239 lines.
Review response
B1 — PAT plaintext fallback 🔴
Fixed end-to-end (commits
c43aa2f,7049d1e,3a12021,f01619c):connect()returns500 {code: "CRYPTO_UNAVAILABLE"}whenBRAIN_REPO_MASTER_KEYis missing orcryptographyis unimportable — never writes plaintext_decrypt_token()returns""+log.errorin every failure branch (never decodes the stored blob as if it were plaintext)start_brain_watcherrefuses to start without a master key_verify_crypto_ready()check emitslog.critical(not WARNING) for the 3 broken states/api/brain-repo/statusand/api/backups/configexposecrypto_ready: boolso the UI renders a danger card with Reconnect CTA when the server cannot encrypt/decryptB2 — restore SSE kwargs 🔴
Fixed in
3aa196b:execute_restore(repo_url, ref, token, install_dir, include_kb, kb_key_matches)1:1install_dirresolved to WORKSPACE root (where SWAP_DIRS are replaced), not the clone pathkb_key_matchescomes from the request body (defaultFalse— safe)brain_repo/__init__.pyimport-time signature check prevents silent drift from recurringWhat ships
Backend —
dashboard/backend/brain_repo/(11 modules)git_ops.py--follow-tags), annotated tags withforce. Returns(ok, error)so callers surface real errors.github_api.pydetect_brain_repos(Search API),create_private_repo,get_repo_info(URL parser),list_snapshots,validate_pat_scopes,get_github_usernamegithub_oauth.pyPATAuthProvider(Fernet encrypt/decrypt),OAuthAuthProviderskeleton behindEVO_NEXUS_GITHUB_CLIENT_IDfeature flagpat_auth.pyPATAuthProvider+decrypt_token(kept for callers that importbrain_repo.pat_auth)manifest.pyread/write/validate_schema/initialize_brain_repo— creates.evo-brain,manifest.yaml, README, directory treemigrations.pysecrets_scanner.pywatcher.pywatchdogobserver, 30s debounce, acceptsflask_appexplicitly to avoid circular import; refuses to start when crypto broken; persists push result tolast_error/last_syncsync_worker.pypush_errinto pending-job recordskb_mirror.pytranscripts_mirror.pyrestore.py__init__.py_verify_api_compat+_verify_crypto_ready— logs CRITICAL on any signature drift or crypto misconfigBackend — new/modified routes
routes/brain_repo.py— status / validate-token / connect / disconnect / detect / snapshots / restore-SSE / sync-force / tag-milestone. Blueprint-level errorhandler convertsabort()to JSON. Every caller of_decrypt_tokentreats empty return as hard failure.routes/onboarding.py— state / start / complete / skip / provider (writes canonicalproviders.jsonschema, not the legacy shape)routes/auth_routes.py—/api/auth/needs-onboarding,/api/auth/mark-agents-visitedroutes/backups.py— extended/api/backups/configwithbrain_repo_configured,brain_repo,brain_crypto_readyso the tabs UI is one round-tripapp.py— SQLite ALTER TABLE migration for new user columns +brain_repo_configstable +providers.jsonschema normalization +BRAIN_REPO_MASTER_KEYautogenerationmodels.py—User.onboarding_state,User.onboarding_completed_agents_visit,BrainRepoConfig(user_id FK)Frontend — Onboarding wizard (
pages/onboarding/)OnboardingRouterstate machine + 7 first-time screens + 5 restore screens:/providers/openai/modelsvalidation + model dropdown/providers/openai/auth-*endpointsRestoreFlow→RestoreSelectRepo→RestoreSelectSnapshot(HEAD / milestones / weekly / daily) →RestoreConfirm(type-to-confirm) →RestoreExecute(SSE progress)OnboardingHeaderrenders the EvoNexus logo + step indicator on every inner screenOracleWelcomeBanneron/agents(one-time, dismissed viaPOST /auth/mark-agents-visited)Frontend —
/backupsredesignlast_error/crypto_readydanger states with Reconnect CTA.zip(existing) + pull from Brain Repo (new)include_kb+kb_key_matchescheckboxes for Brain Repo/api/backups/configso watcher auto-sync results reach the UI without manual refreshFrontend —
/settings/brain-repoFully i18n'd settings page with status card (connected / sync / pending / auto-sync / last_error / crypto danger banner) + Sync now + Create milestone + Disconnect with confirmation.
Frontend — infrastructure
lib/api.ts—buildError()extracts JSONerror/description/messagefrom non-OK responses; prefix preserved so existing.includes('401')callers keep workinghooks/useGlobalNotifications.ts— HTTP health-probe before WS + visibility-guard; eliminates theWebSocket connection failedconsole spam on pages that do not use the terminalApp.tsx— onboarding guard; treatsnullstate as "needs onboarding"i18n (en-US / pt-BR / es)
~270 new keys across
onboarding.*(wizard),restore.*(restore flow),brainRepoSettings.*,backups.*(tabs, destinations, modal, empty states),agents.welcomeBanner.*. All three bundles keep identical key trees with natural translations — no English strings reused as pt-BR/es fallbacks.CLI (
setup.py,backup.py)setup.py— brain-repo prompts after provider selection; PAT collection; auto-generatesBRAIN_REPO_MASTER_KEYbackup.py—--target githubshortcut triggerscommit_all + push + milestone tagvia the dashboard's own sync code pathDev environment
docker-compose.dev.yml— Windows-friendly dev compose: mounts source code (live reload backend), named volumes for data/config/memory/workspace/claude, port 8080 + 32352Dockerfile.dev—npm install --legacy-peer-deps, installs@anthropic-ai/claude-code+@gitlawb/openclaude@latestglobally,sed -i 's/\r//'on entrypoint scripts to survive CRLF from Windows checkoutsDEV-SETUP.md— step-by-step test guideCommit timeline (31 commits, grouped by theme)
Foundation (pre-upstream-merge, base of the PR)
Upstream sync
Zero conflicts — our modules and upstream's Activity/Topics/Toast work never touched the same lines.
Post-merge fixes (bugs surfaced during QA)
Review response (addressing @davidsoncelestino blockers)
Validation
In-container smoke tests (after every commit)
python -c "import ast; ast.parse(...)"→ 113/113 files OKnpm run build→ 0 TypeScript errorsinspect.signature(execute_restore)→{repo_url, ref, token, install_dir, include_kb, kb_key_matches}inspect.signature(git_ops.push)→(repo_dir, token, with_tags=True) -> tuple[bool, str]is_crypto_ready()→ True with valid key; False (CRITICAL log) with missing/empty/invalid key_decrypt_tokenwith missing key → returns""(no.decode("utf-8")of the blob)Real GitHub push validated
Commits + tags reach GitHub end-to-end; watcher persists results to
BrainRepoConfig.last_error/last_sync; UI reflects state within 30s.Architecture decisions worth flagging
PAT now, OAuth flag later —
OAuthAuthProvideris implemented and follows the sameGitHubAuthProviderabstract. FlippingEVO_NEXUS_GITHUB_CLIENT_IDenables it; no route/DB changes needed.Fail loud on write, fail safe on read —
connect()returns500when crypto is unavailable (never writes plaintext)._decrypt_token()returns""+log.error(never decodes the blob as if it were plaintext). Every caller already checksif not token:and aborts.Workspace mirror step —
sync_force,tag_milestoneand the watcher all copymemory/workspace/customizations/config-safefrom workspace → brain_repo clone beforegit add. Without it,commit_allsaw nothing new and auto-sync silently no-opped (found during QA).install_dirvslocal_path— documented where the confusion came from (restore SSE crash) and resolved.install_dir= workspace root (SWAP_DIRS replaced there),local_path= brain-repo clone location.Brain Repo != full backup — restore modal is explicit: Brain Repo snapshots do not recover the SQLite database or files outside the 4 watched folders. For full DR, use a
.zipbackup (local or S3).kb_key_matches default false — Knowledge Base encrypted with a previous master key cannot be decrypted after rotation. The restore modal asks "I still have the original master key" (checkbox), defaults OFF. When false, KB imports as metadata only.
Master-key bootstrap race — acknowledged limitation: if two gunicorn workers start simultaneously on a fresh install, they may generate different keys before either finishes writing
.env. The fail-loud path makes the symptom a visible 500 (not silent plaintext), but the proper fix is to generate the key inentrypoint.shbefore the fork. Tracked as a follow-up.Known limitations / follow-ups
Test plan
.evo-brain+manifest.yaml+ README +.gitignore.evo-brainmarker detected via Search API/agents, Oracle banner visible, dismissable/backups→ 3 destination cards + tabs with countsgh api)memory/→ auto-sync pushes within ~30s; last_sync timestamp in UI updates within 60sBRAIN_REPO_MASTER_KEY+ restart → card shows "encryption broken" red banner, sync returns 500 with codeCRYPTO_UNAVAILABLE/onboarding