Skip to content

feat(brain-repo): brain repo versioning + onboarding wizard with restore flow#32

Closed
NeritonDias wants to merge 32 commits into
evolution-foundation:developfrom
NeritonDias:feat/brain-repo-onboarding
Closed

feat(brain-repo): brain repo versioning + onboarding wizard with restore flow#32
NeritonDias wants to merge 32 commits into
evolution-foundation:developfrom
NeritonDias:feat/brain-repo-onboarding

Conversation

@NeritonDias
Copy link
Copy Markdown
Contributor

@NeritonDias NeritonDias commented Apr 23, 2026

Summary

Three interrelated features that together close the gap between a fresh EvoNexus install and a fully versioned, recoverable workspace:

  • Brain Repo — automatic GitHub versioning of memory/, workspace/, customizations and config via a dedicated private repo (evo-brain-<username>). File watcher with 30s debounce persists every change to GitHub. Daily/weekly/milestone snapshots can be restored via a streaming SSE engine.
  • Onboarding Wizard — post-account-creation flow replacing the cold-drop into Overview. First-time users pick an AI provider (per-provider sub-flows for Anthropic / OpenAI / OpenRouter / Codex OAuth) and optionally connect a brain repo. Returning users can restore from any snapshot.
  • Unified /backups page — single surface for Local ZIPs, S3 and Brain Repo with tabs, import dropdown, contextual restore modal, danger-state when crypto is broken.

31 commits, 64 files changed, +9805 / -239 lines.


Review response

B1 — PAT plaintext fallback 🔴

Fixed end-to-end (commits c43aa2f, 7049d1e, 3a12021, f01619c):

  • connect() returns 500 {code: "CRYPTO_UNAVAILABLE"} when BRAIN_REPO_MASTER_KEY is missing or cryptography is unimportable — never writes plaintext
  • _decrypt_token() returns "" + log.error in every failure branch (never decodes the stored blob as if it were plaintext)
  • start_brain_watcher refuses to start without a master key
  • Import-time _verify_crypto_ready() check emits log.critical (not WARNING) for the 3 broken states
  • /api/brain-repo/status and /api/backups/config expose crypto_ready: bool so the UI renders a danger card with Reconnect CTA when the server cannot encrypt/decrypt

B2 — restore SSE kwargs 🔴

Fixed in 3aa196b:

  • Call site now matches execute_restore(repo_url, ref, token, install_dir, include_kb, kb_key_matches) 1:1
  • install_dir resolved to WORKSPACE root (where SWAP_DIRS are replaced), not the clone path
  • kb_key_matches comes from the request body (default False — safe)
  • New brain_repo/__init__.py import-time signature check prevents silent drift from recurring

What ships

Backend — dashboard/backend/brain_repo/ (11 modules)

Module Purpose
git_ops.py Tokenized clone/commit/push (--follow-tags), annotated tags with force. Returns (ok, error) so callers surface real errors.
github_api.py detect_brain_repos (Search API), create_private_repo, get_repo_info (URL parser), list_snapshots, validate_pat_scopes, get_github_username
github_oauth.py PATAuthProvider (Fernet encrypt/decrypt), OAuthAuthProvider skeleton behind EVO_NEXUS_GITHUB_CLIENT_ID feature flag
pat_auth.py Shim re-exporting PATAuthProvider + decrypt_token (kept for callers that import brain_repo.pat_auth)
manifest.py read/write/validate_schema/initialize_brain_repo — creates .evo-brain, manifest.yaml, README, directory tree
migrations.py Schema-version migration registry
secrets_scanner.py 21 regex patterns (AWS, GitHub, Anthropic, OpenAI, Stripe, JWT, SSH, etc.). Applied before every push — matched files are deleted, never leave the machine
watcher.py watchdog observer, 30s debounce, accepts flask_app explicitly to avoid circular import; refuses to start when crypto broken; persists push result to last_error/last_sync
sync_worker.py JSON retry queue (60s interval) for push failures; threads push_err into pending-job records
kb_mirror.py Export KB → markdown + import markdown → KB (Postgres optional)
transcripts_mirror.py Rolling 30-day window of Claude project transcripts with pruning
restore.py 10-step streaming generator: clone → validate → migrate → secrets_scan → kb_validate → backup_originals → swap → kb_import → manifest_update → complete
__init__.py Import-time _verify_api_compat + _verify_crypto_ready — logs CRITICAL on any signature drift or crypto misconfig

Backend — new/modified routes

  • routes/brain_repo.py — status / validate-token / connect / disconnect / detect / snapshots / restore-SSE / sync-force / tag-milestone. Blueprint-level errorhandler converts abort() to JSON. Every caller of _decrypt_token treats empty return as hard failure.
  • routes/onboarding.py — state / start / complete / skip / provider (writes canonical providers.json schema, not the legacy shape)
  • routes/auth_routes.py/api/auth/needs-onboarding, /api/auth/mark-agents-visited
  • routes/backups.py — extended /api/backups/config with brain_repo_configured, brain_repo, brain_crypto_ready so the tabs UI is one round-trip
  • app.py — SQLite ALTER TABLE migration for new user columns + brain_repo_configs table + providers.json schema normalization + BRAIN_REPO_MASTER_KEY autogeneration
  • models.pyUser.onboarding_state, User.onboarding_completed_agents_visit, BrainRepoConfig (user_id FK)

Frontend — Onboarding wizard (pages/onboarding/)

OnboardingRouter state machine + 7 first-time screens + 5 restore screens:

  • Welcome — animated network canvas, two CTAs (first-time / restore)
  • StepProvider — 2×2 card grid + per-provider sub-steps:
    • Anthropic: "configure via terminal" + no-key flow
    • OpenAI: key + /providers/openai/models validation + model dropdown
    • OpenRouter: key + base URL chip + free-text model
    • Codex: 2 tabs (Browser OAuth, Device flow) — reuses existing /providers/openai/auth-* endpoints
  • StepBrainRepo → StepBrainConnect → StepBrainChoose → StepConfirm — PAT validate → create-new vs existing-repo → review/finish
  • Restore flow: RestoreFlowRestoreSelectRepoRestoreSelectSnapshot (HEAD / milestones / weekly / daily) → RestoreConfirm (type-to-confirm) → RestoreExecute (SSE progress)
  • Shared OnboardingHeader renders the EvoNexus logo + step indicator on every inner screen
  • OracleWelcomeBanner on /agents (one-time, dismissed via POST /auth/mark-agents-visited)

Frontend — /backups redesign

  • 3 destination cards at the top (Local / S3 / Brain Repo) with explicit status badges, path/bucket/repo display, and last_error / crypto_ready danger states with Reconnect CTA
  • Unified tabs replacing the stacked local+S3 layout — counts per tab, refresh per tab
  • Brain Repo tab lists snapshots grouped (HEAD / milestones / weekly collapsible / daily collapsible) with Restore + View-on-GitHub actions
  • Import dropdown with 2 sources: upload .zip (existing) + pull from Brain Repo (new)
  • Contextual restore modal — merge/replace for local/S3 ZIPs, SSE progress + include_kb + kb_key_matches checkboxes for Brain Repo
  • 30s visibility-aware polling refreshes /api/backups/config so watcher auto-sync results reach the UI without manual refresh

Frontend — /settings/brain-repo

Fully i18n'd settings page with status card (connected / sync / pending / auto-sync / last_error / crypto danger banner) + Sync now + Create milestone + Disconnect with confirmation.

Frontend — infrastructure

  • lib/api.tsbuildError() extracts JSON error/description/message from non-OK responses; prefix preserved so existing .includes('401') callers keep working
  • hooks/useGlobalNotifications.ts — HTTP health-probe before WS + visibility-guard; eliminates the WebSocket connection failed console spam on pages that do not use the terminal
  • App.tsx — onboarding guard; treats null state as "needs onboarding"

i18n (en-US / pt-BR / es)

~270 new keys across onboarding.* (wizard), restore.* (restore flow), brainRepoSettings.*, backups.* (tabs, destinations, modal, empty states), agents.welcomeBanner.*. All three bundles keep identical key trees with natural translations — no English strings reused as pt-BR/es fallbacks.

CLI (setup.py, backup.py)

  • setup.py — brain-repo prompts after provider selection; PAT collection; auto-generates BRAIN_REPO_MASTER_KEY
  • backup.py--target github shortcut triggers commit_all + push + milestone tag via the dashboard's own sync code path

Dev environment

  • docker-compose.dev.yml — Windows-friendly dev compose: mounts source code (live reload backend), named volumes for data/config/memory/workspace/claude, port 8080 + 32352
  • Dockerfile.devnpm install --legacy-peer-deps, installs @anthropic-ai/claude-code + @gitlawb/openclaude@latest globally, sed -i 's/\r//' on entrypoint scripts to survive CRLF from Windows checkouts
  • DEV-SETUP.md — step-by-step test guide

Commit timeline (31 commits, grouped by theme)

Foundation (pre-upstream-merge, base of the PR)

694573b  feat(brain-repo): foundation — BrainRepoConfig model, onboarding columns, watchdog dep
dba17a0  feat(onboarding): backend routes — needs-onboarding, state endpoints
2c80b3f  feat(brain-repo): brain_repo route — status/connect/detect/snapshots/restore/sync
e277b73  feat(brain-repo): github_oauth/api/git_ops/watcher/sync_worker/kb_mirror/transcripts/restore/secrets_scanner
6a34439  feat(onboarding): frontend wizard — first-time + restore + Oracle banner
407a678  chore(setup.py): brain-repo CLI prompts + backup github target + /salve skill
5b22ef3  i18n: 13 brain-repo keys across en-US/pt-BR/es
e2b0dd4  chore: final audit handoff
0ecc781  chore: docker-compose.dev.yml + DEV-SETUP.md
c19698c  fix(dev): Dockerfile.dev with --legacy-peer-deps

Upstream sync

95adb0c  Merge remote-tracking branch 'upstream/develop' into feat/brain-repo-onboarding

Zero conflicts — our modules and upstream's Activity/Topics/Toast work never touched the same lines.

Post-merge fixes (bugs surfaced during QA)

ec393f6  fix(brain-repo): correct PATAuthProvider API + add /validate-token + bootstrap remote repo
fbdd113  fix(onboarding): canonical providers.json schema + boot-time migration
e46a002  fix(api): expose backend JSON error.description in lib/api.ts
ff96d13  fix(onboarding): refreshUser() before navigate to prevent /onboarding redirect loop
f9826cf  feat(onboarding): per-provider sub-flows + use /validate-token instead of /connect
6f29f7b  i18n(onboarding): translate 12 wizard screens + Oracle banner — ~150 keys
26945d8  fix(dev): Dockerfile.dev installs claude-code/openclaude + CRLF fix for entrypoint scripts

Review response (addressing @davidsoncelestino blockers)

3aa196b  fix(brain-repo): correct execute_restore kwargs + import-time API contract check  (B2)
f1ed7d8  fix(brain-repo): start_brain_watcher accepts flask_app, no circular import
6e7537f  chore(notifications): health-check before WS + visibility guard
4bbfa17  feat(backups): unified tabs + import dropdown + SSE restore modal
0e8235a  feat(backups): surface brain-repo last_error + 30s polling
c43aa2f  fix(brain-repo): fail loud on missing crypto — no plaintext PAT fallback            (B1)
7049d1e  fix(brain-repo): guard master_key in watcher startup — fail loud                    (B1)
3a12021  feat(brain-repo): expose crypto_ready + import-time CRITICAL check                  (B1)
f01619c  feat(backups,settings): surface crypto_ready — danger state + Reconnect             (B1)
26c2397  fix(backups): /brain-repo/snapshots uniform shape + frontend undefined guards
a7a679d  fix(brain-repo): commit the git_ops.push tuple API + sync_worker caller update
ebf61c9  feat(onboarding): shared OnboardingHeader with EvoNexus logo on inner steps
141db8c  fix(brain-repo): watcher persists push result to last_error / last_sync

Validation

In-container smoke tests (after every commit)

  • python -c "import ast; ast.parse(...)" → 113/113 files OK
  • npm run build → 0 TypeScript errors
  • inspect.signature(execute_restore){repo_url, ref, token, install_dir, include_kb, kb_key_matches}
  • inspect.signature(git_ops.push)(repo_dir, token, with_tags=True) -> tuple[bool, str]
  • is_crypto_ready() → True with valid key; False (CRITICAL log) with missing/empty/invalid key
  • _decrypt_token with missing key → returns "" (no .decode("utf-8") of the blob)

Real GitHub push validated

$ gh api repos/NeritonDias/evo-brain-neritondias/commits --jq '.[0]'
{"sha": "430dde72...", "message": "manual sync 2026-04-23T21:15:39"}

$ gh api repos/NeritonDias/evo-brain-neritondias/tags --jq '.[].name'
milestone/teste
milestone/manual-2026-04-23-21-15-39
milestone/manual-2026-04-23-21-07
milestone/manual-2026-04-23-20-59
milestone/manual-2026-04-23-20-55

Commits + tags reach GitHub end-to-end; watcher persists results to BrainRepoConfig.last_error/last_sync; UI reflects state within 30s.


Architecture decisions worth flagging

  1. PAT now, OAuth flag laterOAuthAuthProvider is implemented and follows the same GitHubAuthProvider abstract. Flipping EVO_NEXUS_GITHUB_CLIENT_ID enables it; no route/DB changes needed.

  2. Fail loud on write, fail safe on readconnect() returns 500 when crypto is unavailable (never writes plaintext). _decrypt_token() returns "" + log.error (never decodes the blob as if it were plaintext). Every caller already checks if not token: and aborts.

  3. Workspace mirror stepsync_force, tag_milestone and the watcher all copy memory/workspace/customizations/config-safe from workspace → brain_repo clone before git add. Without it, commit_all saw nothing new and auto-sync silently no-opped (found during QA).

  4. install_dir vs local_path — documented where the confusion came from (restore SSE crash) and resolved. install_dir = workspace root (SWAP_DIRS replaced there), local_path = brain-repo clone location.

  5. Brain Repo != full backup — restore modal is explicit: Brain Repo snapshots do not recover the SQLite database or files outside the 4 watched folders. For full DR, use a .zip backup (local or S3).

  6. kb_key_matches default false — Knowledge Base encrypted with a previous master key cannot be decrypted after rotation. The restore modal asks "I still have the original master key" (checkbox), defaults OFF. When false, KB imports as metadata only.

  7. Master-key bootstrap race — acknowledged limitation: if two gunicorn workers start simultaneously on a fresh install, they may generate different keys before either finishes writing .env. The fail-loud path makes the symptom a visible 500 (not silent plaintext), but the proper fix is to generate the key in entrypoint.sh before the fork. Tracked as a follow-up.


Known limitations / follow-ups

Test plan

  • Fresh install → setup wizard → onboarding wizard appears
  • Pick Anthropic in wizard → message directs to terminal, no API key prompted
  • Pick OpenAI → key validates → model dropdown populates → save → provider active
  • Pick Codex → OAuth tab flow works end-to-end (Browser + Device)
  • Connect brain repo with new repo name → repo created on GitHub, populated with .evo-brain + manifest.yaml + README + .gitignore
  • Connect brain repo with existing repo → .evo-brain marker detected via Search API
  • Finish onboarding → lands on /agents, Oracle banner visible, dismissable
  • Refresh → banner stays dismissed (persisted in user record)
  • /backups → 3 destination cards + tabs with counts
  • Click Importar → dropdown shows Upload + Pull-from-Brain-Repo
  • Sync now / Create milestone → commits + tag land on GitHub (verify with gh api)
  • Edit a file in memory/ → auto-sync pushes within ~30s; last_sync timestamp in UI updates within 60s
  • Restore a milestone → SSE progress bar animates through 10 steps; files match snapshot
  • Unset BRAIN_REPO_MASTER_KEY + restart → card shows "encryption broken" red banner, sync returns 500 with code CRYPTO_UNAVAILABLE
  • Swap PAT for an invalid one → card shows last_error, Reconnect button routes to /onboarding

… columns, BRAIN_REPO_MASTER_KEY, watchdog dep

- Add User.onboarding_state + User.onboarding_completed_agents_visit columns
- Add BrainRepoConfig model (user_id FK, encrypted token, repo metadata, sync state)
- Add needs_onboarding() helper in models.py
- SQLite migration block in app.py (ALTER TABLE pattern)
- BRAIN_REPO_MASTER_KEY auto-generated in .env on first run
- Register onboarding + brain_repo blueprints (try/except — files created in later commits)
- Add /api/auth/needs-onboarding to PUBLIC_PATHS
- Add watchdog>=4.0 to pyproject.toml
- Create dashboard/backend/brain_repo/__init__.py
- Create .handoffs/ with phase-0-handoff, domain-auth, domain-watcher
… endpoints

- Updated auth_routes.py import to include BrainRepoConfig and needs_onboarding
- Added GET /api/auth/needs-onboarding (public, handles unauthenticated + pre-setup)
- Added POST /api/auth/mark-agents-visited (login_required)
- Created routes/onboarding.py with state, start, complete, skip, provider endpoints
…ts, restore SSE, sync

- GET /api/brain-repo/status — returns config dict or {connected: false}
- POST /api/brain-repo/connect — validate PAT, create/validate repo, encrypt+store token
- POST /api/brain-repo/disconnect — clears token and disables sync
- GET /api/brain-repo/detect — find candidate brain repos for a PAT
- GET /api/brain-repo/snapshots — list daily/weekly/milestone/head refs
- POST /api/brain-repo/restore/start — SSE stream via stream_with_context
- POST /api/brain-repo/sync/force — commit+push+milestone tag
- POST /api/brain-repo/tag/milestone — create named milestone tag
All brain_repo sub-module calls use try/except ImportError for graceful fallback
…b_mirror, transcripts, restore, secrets_scanner
…salve skill + .gitignore

- setup.py: add ask_password, ask_choice, _ensure_brain_master_key,
  _save_brain_repo_pat helpers; add Brain Repo wizard block in main()
  after choose_provider() (local mode); add brain_repo_* keys to all
  3 language bundles (en-US, pt-BR, es) in MESSAGES dict
- backup.py: add backup_to_github() function and 'github' --target
  option; dispatches to brain_repo.git_ops for commit+push+tag
- .claude/skills/salve/SKILL.md: new skill that flushes session to
  memory files and relies on Brain Repo file watcher for auto-sync
- .gitignore: add memory/raw-transcripts/
- .handoffs/phase-4-cli-handoff.md: phase summary
Add brainRepo namespace with 14 keys to all three locale files:
brain_repo_enable_prompt, brain_repo_auth_method, brain_repo_defer_to_web,
brain_repo_pat_instructions, brain_repo_pat_prompt, brain_repo_pat_saved,
brain_repo_pat_skipped, brain_repo_configure_later, brain_repo_connect,
brain_repo_disconnect, brain_repo_sync_now, brain_repo_create_milestone,
brain_repo_status_connected, brain_repo_status_disconnected.

Key parity verified: all diffs between bundles are empty sets.
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @NeritonDias, your pull request is larger than the review limit of 150000 diff characters

@NeritonDias NeritonDias force-pushed the feat/brain-repo-onboarding branch from 6017761 to e2b0dd4 Compare April 23, 2026 17:55
@DavidsonGomes
Copy link
Copy Markdown
Member

Code review — PR #32

Resumo da mudança: três features interligadas — (1) "Brain Repo" versionando memory//workspace//customizations em repo GitHub privado via PAT com file watcher (watchdog, 30s debounce) e sync worker de retry; (2) wizard de onboarding pós-criação-de-conta com restore via SSE a partir de snapshots (daily/weekly/milestones); (3) esqueleto de OAuth atrás de feature flag EVO_NEXUS_GITHUB_CLIENT_ID. 57 arquivos, +5937/-9.

🔴 Bloqueadores

  1. PAT do GitHub armazenado em texto puro (vazamento de credencial)dashboard/backend/routes/brain_repo.py:29 e :118 importam from brain_repo.pat_auth import PATAuthProvider, mas o módulo pat_auth não existe nesta PR (o que existe é github_oauth.py com uma classe PATAuthProvider de API totalmente diferente). Como consequência, sempre cai no except ImportError e o fallback na linha 123 faz encrypted = token.encode("utf-8") — grava o PAT sem criptografia em BrainRepoConfig.github_token_encrypted. Na leitura (_decrypt_token), a linha 35 faz config.github_token_encrypted.decode("utf-8") e devolve o "token cifrado" como plaintext (porque ele é plaintext). É um bypass silencioso do modelo de criptografia Fernet que o PR inteiro se propõe a ter. Além do import ausente, as duas classes PATAuthProvider (a de github_oauth.py com get_token()/encrypt_token() e a que as rotas esperam com .encrypt(token)/.decrypt(bytes)) têm APIs incompatíveis — só ajustar o import não resolve.

  2. Restore SSE crasha no primeiro uso por kwargs incorretosroutes/brain_repo.py:239-245 chama restore.execute_restore(token=..., repo_url=..., local_path=..., ref=..., include_kb=...). A assinatura real em brain_repo/restore.py:28-35 é execute_restore(repo_url, ref, token, install_dir, include_kb, kb_key_matches). local_path não existe, install_dir (obrigatório) não é passado, kb_key_matches também não. Qualquer POST /api/brain-repo/restore/start levanta TypeError — e como está dentro de SSE generator, emite o event de erro genérico e a feature inteira de restore fica quebrada.

🟡 Sugestões

  1. except ImportError: <graceful fallback> em 5+ lugares mascara o bloqueador #1routes/brain_repo.py faz esse padrão para pat_auth, github_api, restore, git_ops, kb_mirror. Transforma erros de módulo-ausente em comportamento silencioso (foi exatamente o vetor do #1). Se o módulo é requisito, o import deveria ser top-level — se faltar, o app não sobe. "Graceful fallback" aqui é armadilha; recomendo remover todos.

  2. BRAIN_REPO_MASTER_KEY gerada e appendada cegamente em .envapp.py:2067-2079 faz _env_lines.append(...) sem checar duplicata, sem chmod 600, sem log explícito. .env está no .gitignore (confirmado), então sem risco imediato de commit, mas para uma chave que descriptografa todos os PATs armazenados a permissão do arquivo importa. Sugiro: idempotência (checar se BRAIN_REPO_MASTER_KEY= já está no arquivo antes de appendar), chmod 0o600 após escrever, e print avisando o usuário.

  3. SyncWorker é dead codebrain_repo/sync_worker.py (154 linhas, classe completa com start/stop/enqueue/_loop) não é instanciada em lugar nenhum do diff. app.py só chama start_brain_watcher. Remover agora ou adicionar o bootstrap.

  4. start_brain_watcher sem awareness de multi-worker / hot reloadapp.py:2676 chama no startup do módulo. Em flask run --debug o módulo recarrega → múltiplos observers watchdog; em produção com gunicorn multi-worker cada worker inicia um watcher próprio, com BrainRepoConfig.query.filter_by(sync_enabled=True).first() pegando só o primeiro config — no modelo multi-user que a PR suporta, só um usuário é observado. Proteger com WERKZEUG_RUN_MAIN no dev, lockfile ou env em prod, e iterar sobre todos os configs habilitados.

  5. Ausência total de testes para 5937 linhas — zero test_*.py no diff. Mexe em auth, criptografia de PAT, SSE streaming, git subprocess, watchdog, migrations, secrets scanner. Para sair de draft, no mínimo: unit do secrets_scanner (cada padrão com positivo/negativo), round-trip do PATAuthProvider (o bloqueador #1 teria sido pego), execute_restore com git_ops mockado (o bloqueador fix: detect npm.cmd on Windows in setup prerequisite check #2 teria sido pego de cara).

  6. .handoffs/phase-*.md commitados no repo — 10 arquivos (+817 linhas) parecem notas internas de desenvolvimento fase-por-fase. Não agregam valor no repo público; sugiro remover ou mover para workspace/development/features/brain-repo/ (padrão do EvoNexus) com prefixo [C].

  7. Escopo misturado — brain repo + onboarding wizard + OAuth scaffold são 3 features que mereciam 3 PRs separadas (mais fáceis de revisar, reverter, rastrear). Não é bloqueador, mas o Sourcery desistiu da review justamente pelo tamanho — já é sinal.

💭 Nits

  • restore.py:14STAGING_DIR = Path("/tmp/brain-restore-staging") hardcoded; usar tempfile.mkdtemp(prefix="brain-restore-") evita colisão multi-user e melhora portabilidade (Windows).
  • git_ops.push monta URL autenticada a cada push; preferir configurar credential.helper ou remote.origin.url uma vez. Token em argv aparece brevemente em ps -ef.
  • github_oauth.OAuthAuthProvider.start_oauth_flow não inclui state — GitHub recomenda para defesa CSRF.
  • watcher.py:_on_change cria novo threading.Timer a cada evento — em rajadas grandes de I/O milhares de timers são criados/cancelados; considere um único timer "pending" com reset em vez de recriar.

Próximos passos

  1. Corrigir bloqueador #1 — implementar brain_repo/pat_auth.py com a API que as rotas esperam (PATAuthProvider(master_key).encrypt(token) / .decrypt(bytes)) ou refatorar as rotas para usar o PATAuthProvider já existente em github_oauth.py. Remover todos os except ImportError: <plaintext fallback>.
  2. Corrigir bloqueador fix: detect npm.cmd on Windows in setup prerequisite check #2 — alinhar kwargs de execute_restore (install_dir, kb_key_matches) e adicionar teste que exercita o endpoint SSE de restore ponta-a-ponta com um repo fake.
  3. Testes mínimos dos 3 componentes críticos (PATAuthProvider round-trip, secrets_scanner, execute_restore mockado).
  4. Limpar .handoffs/.
  5. Só depois: sair de draft e pedir review.

Obrigado pelo trabalho — a arquitetura geral (PAT→OAuth abstrata, secrets scanner gate antes do swap no restore, manifest versionado) está bem pensada. Os dois 🔴 são corrigíveis e o resto é polimento.

…bootstrap remote repo

- routes/brain_repo.py: replace broken PATAuthProvider(key).encrypt(token) calls
  with PATAuthProvider(token, key).encrypt_token(); decrypt now uses module-level
  decrypt_token() helper. Handle missing BRAIN_REPO_MASTER_KEY gracefully.
- Add new POST /api/brain-repo/validate-token endpoint that validates a PAT via
  GitHub /user without persisting anything (used by onboarding to check token
  before showing the repo-choice step).
- Convert blueprint abort() calls to JSON via errorhandler so the SPA can show
  the actual backend message instead of a generic '400 BAD REQUEST'.
- Add friendlier error in connect() when the GitHub repo already exists (422 →
  bilingual message guiding the user to 'Use existing' or pick another name).
- New helper _initialize_remote_brain_repo: after create_private_repo creates an
  empty repo on GitHub, clone-init it locally, drop in the brain-repo skeleton
  via manifest.initialize_brain_repo (.evo-brain marker, manifest.yaml, dirs,
  README, .gitignore), commit and push to origin/main. Without this the repo
  stays empty and detect_brain_repos (code search by .evo-brain) cannot find
  it later — UI rejected such repos as 'incompatible'.
- brain_repo/github_api.py: add get_repo_info(token, repo_url) that parses a
  GitHub URL, fetches the repo, and returns (is_private, info_dict). Replaces
  the broken validate_repo_is_private(token, repo_url) call site that expected
  a tuple but the underlying function returned bool and took 3 args.
- brain_repo/pat_auth.py: re-export shim so existing routes/brain_repo.py
  imports of brain_repo.pat_auth.PATAuthProvider keep working (the actual
  implementation lives in github_oauth.py).
…gration

- routes/onboarding.py: rewrite set_provider so it never writes the legacy
  {<id>: {api_key, enabled}} shape that broke routes/providers.py. Now reuses
  _read_config / _write_config / ALLOWED_ENV_VARS from routes.providers and
  produces the canonical {active_provider, providers: {<id>: {cli_command,
  env_vars, ...}}} schema. Anthropic just flips active_provider (no key);
  OpenAI/OpenRouter merge filtered env_vars; codex / codex_auth is a no-op
  that defers to the existing /api/providers/openai OAuth endpoints.
- Env vars outside ALLOWED_ENV_VARS or containing shell metacharacters are
  silently dropped — same policy as routes/providers.py._sanitize_env_vars.
- app.py: add boot-time providers.json schema-normalization migration.
  If the file exists but is missing active_provider or providers keys, the
  app overwrites it from providers.example.json. Recovers from the broken
  state left by older versions of the onboarding wizard without requiring
  the user to manually reset volumes.
The api helper used to throw 'new Error("400 BAD REQUEST")' for any non-OK
response, dropping the actual server message — so the UI showed a useless
generic status instead of the specific reason ('repo already exists',
'token required', etc).

New buildError() reads the response body, prefers JSON {error|description|
message} fields, falls back to short text, and prefixes with the status so
existing callers that pattern-match on '401' / '403' keep working.
… redirect loop

Two bugs were producing the same symptom — wizard kept reloading instead of
landing on /agents after completion:

1. App.tsx guard ignored onboarding_state === null. New users have NULL by
   default in the column, so they bypassed the guard and reached /agents
   without ever seeing the wizard. Guard now treats null as 'needs onboarding'
   (only completed/skipped count as done).

2. OnboardingRouter handleComplete/handleSkip called POST /onboarding/complete
   then navigated to /agents — but the React user state still had
   onboarding_state='pending' cached from /auth/me. The guard would re-read
   that cached value and bounce back to /onboarding, where the useEffect saw
   the fresh 'completed' state from the backend and tried to navigate again.
   Infinite loop. Fix: await refreshUser() (from AuthContext) before navigate
   so the guard sees the current state.
…d of /connect

StepProvider was a single generic API-key form for all 4 providers — but
Anthropic, OpenAI, OpenRouter and Codex each need a different flow:

- Anthropic: no API key, user logs in by running 'claude' in the terminal at
  the Agents page. Sub-step shows that instruction + flips active_provider.
- OpenAI: API key input → POST /providers/openai/models validates and lists
  available models → dropdown to pick model → save (env_vars merge).
- OpenRouter: same shape as OpenAI but with read-only Base URL chip
  (https://openrouter.ai/api/v1) and free-text model field.
- Codex: two tabs (Browser OAuth, Device Auth) replicating the Providers.tsx
  modal — calls /providers/openai/auth-start|auth-complete or device-start|
  device-poll. Tab/button colors aligned to the onboarding palette (#00FFA7).

StepBrainConnect.tsx and RestoreSelectRepo.tsx now call POST
/api/brain-repo/validate-token (validates without persisting) instead of
/connect (which requires repo_url|create_repo and was 400-ing on token-only
calls). RestoreSelectRepo also passes the token to /detect as a query param
so the backend doesn't need a stored config to enumerate brain repos.

i18n keys for these screens added in the same pass since the components
were rewritten.
…keys

The whole onboarding wizard (7 first-time screens + 5 restore screens) and
the OracleWelcomeBanner were hardcoded English. Now wired through useTranslation
with new namespaces:

- onboarding.welcome / .stepIndicator / .provider / .brainRepo / .connect /
  .choose / .confirm / .providerLabels / .providerAnthropic / .providerOpenAI /
  .providerOpenRouter / .providerCodex / .backToProviders
- restore.selectRepo / .selectSnapshot / .confirm / .execute
- agents.welcomeBanner

All 3 bundles (en-US, pt-BR, es) have the identical key tree with natural
translations — no English strings reused as fallbacks in pt-BR / es.
…n scripts

- Install @anthropic-ai/claude-code and @gitlawb/openclaude@latest as a global
  npm packages (matches Dockerfile.swarm.dashboard). Without these, the
  embedded terminal at /agents and the /providers page report 'CLI missing'.
- Run sed -i 's/\r//' on entrypoint.sh and start-dashboard.sh after COPY so
  Windows CRLF line endings don't crash the container with
  'env: bash\r: No such file or directory' on first boot.
- Update header comment to reflect that the dev image now installs both CLIs.
…tract check

## Bug (flagged in PR evolution-foundation#32 review by @davidsoncelestino)

routes/brain_repo.py:restore_start called execute_restore with keyword args
that don't match the real signature in brain_repo/restore.py. The call
passed `local_path` (not a parameter of that function) and omitted two
required args — `install_dir` and `kb_key_matches`.

A signature mismatch on Python keyword-only calls raises TypeError before
the generator emits any events. Inside the SSE `generate()` it was caught
by the generic `except Exception` branch and turned into a vague error
event, so the feature was 100% broken but the failure mode looked like a
transient restore error rather than an API mismatch.

## Fix

- Call execute_restore with the actual signature:
  (repo_url, ref, token, install_dir, include_kb, kb_key_matches).
- install_dir is resolved to WORKSPACE (project root) — that's where the
  SWAP_DIRS (memory/workspace/customizations/config-safe) are replaced.
  It is NOT the brain-repo clone path, which was the old confusion.
- kb_key_matches is a new body parameter (default False = safe). The
  upcoming frontend restore modal will expose it as a checkbox
  "I still have the original master key". When False, KB import silently
  degrades to metadata-only inside execute_restore (existing behaviour).

## Prevention

Added brain_repo/__init__.py import-time check that runs
inspect.signature on PATAuthProvider.__init__ and execute_restore, logs
a WARNING if they drift. Next time routes and brain_repo modules go out
of sync the mismatch appears in startup logs instead of silently falling
into an `except ImportError` that stored PAT tokens as plaintext (the
exact bypass the reviewer flagged separately).
…lar import

The watcher previously did `from app import app` inside its own startup
function, which fires during app.py module load — SQLAlchemy's init_app
hadn't completed yet, so the half-initialised db instance was the "wrong"
one from the watcher's perspective. Every boot logged:

    start_brain_watcher failed: The current Flask app is not registered
    with this 'SQLAlchemy' instance

and auto-sync stayed permanently off. Users had to hit Sync manually.

Fix:
- start_brain_watcher now accepts `flask_app` as a parameter. app.py
  passes the already-fully-initialised `app` instance when it calls it.
- Legacy fallback preserved: if nobody passes flask_app, it still tries
  the old import (logs error, returns None) so out-of-tree callers keep
  working.
- Bonus: watcher's auto-sync closure now mirrors workspace → brain_repo
  before commit (using _sync_workspace_to_brain_repo from routes). Before,
  auto-sync would commit an empty brain_repo_dir because nobody copied
  the workspace into it — same root cause as the manual Sync bug fixed
  earlier but silently swallowed by the daemon.
Fixes the repeating console noise "WebSocket connection to
'ws://localhost:32352/ws' failed" that some users see on pages that
don't use the terminal at all (e.g. /backups).

Root cause: useGlobalNotifications is mounted once in the app shell and
opens a WS to the terminal-server for agent-finished notifications. When
:32352 is unreachable the ws.onerror → ws.close → ws.onclose chain
triggered reconnect with exponential backoff capped at 30s — but every
failed attempt wrote an error line to the console, so users saw the
message every 1-30s forever.

Two fixes:

1. HTTP health-probe at TS_HTTP/api/health before opening the WS. If it
   fails, back off to 30s and log ONCE ("terminal-server unreachable —
   notifications disabled"). Skips the WS entirely so the browser never
   emits the native "WebSocket connection failed" error.

2. Visibility guard: when the tab is hidden, stop trying. Resume as soon
   as the user focuses the tab. Saves CPU + removes noise while multi-
   tabbing.

Notifications behaviour is unchanged when the server is up — the first
health check passes, the WS opens as before.
… SSE restore modal

Replaces the stacked "Local list + S3 list below" layout with three explicit
tabs: Local / S3 / Brain Repo. Each tab has its own count badge, empty
state, refresh button and table. The Brain Repo tab groups snapshots into
4 sections (HEAD, manual milestones, weekly, daily) — weekly/daily
collapsed by default because they can hit 30+ entries each.

Import button is now a dropdown with 2 sources:
- Upload .zip from disk (existing behaviour)
- Pull snapshot from Brain Repo (new — switches to the Brain tab)

Brain Repo snapshots get their own restore modal, distinct from the
Local/S3 merge|replace flow because semantics differ:
- Warning upfront that brain restore does NOT recover the DB or files
  outside memory/workspace/customizations/config-safe.
- "Include kb-mirror/" checkbox (off by default).
- "I still have the original master key" checkbox, only visible when
  include_kb is on. Maps to the kb_key_matches body param from the
  earlier backend fix — defaults OFF so a wrong key can't decrypt-poison
  the KB.
- SSE progress bar (~10 steps) via POST /api/brain-repo/restore/start.

All text wired through useTranslation — ~40 new keys across en-US,
pt-BR and es under backups.tabs, .s3Tab, .brainTab, .brainSnapshots,
.importMenu, .brainRestore, plus viewOnGithub action.

Lazy-fetches /api/brain-repo/snapshots only when the Brain tab is
opened AND the brain repo is configured (avoids hitting the GitHub
API needlessly on every /backups visit).
…olling

When the watcher auto-sync fails (token expired, remote gone, network
flap) backend writes the reason to BrainRepoConfig.last_error — but the
UI used to ignore it silently. Users had no signal that sync was broken
except noticing commits not appearing on GitHub.

Now the Brain Repo destination card:
- Switches border to danger red when last_error is set.
- Replaces the "connected" pill with a "sync error" pill.
- Shows the error message verbatim in a tight banner below the repo URL.
- Adds a "Reconnect" pill button that routes back through /onboarding
  (clears the stored config + walks through PAT + repo selection again).

Polling: a 30s setInterval on /api/backups/config while the user is on
/backups AND brain is connected. Gated by document.visibilityState so it
pauses in background tabs (saves CPU + GitHub rate limit). Lets the user
see last_error clear within 30s after a successful manual Sync without
requiring a full page refresh.

i18n: +2 keys (syncError, reconnect) across en-US, pt-BR, es.
@NeritonDias
Copy link
Copy Markdown
Contributor Author

@davidsoncelestino valeu pela review profunda — os dois bloqueadores foram endereçados. Deixo o estado consolidado:

B1 — PAT plaintext: resolvido em commits anteriores (antes desta review chegar)

brain_repo/pat_auth.py foi adicionado como shim re-exportando PATAuthProvider e decrypt_token de github_oauth.py. As APIs agora batem:

  • connect() usa PATAuthProvider(token, master_key).encrypt_token() (assinatura correta)
  • _decrypt_token() usa a função módulo decrypt_token(encrypted, master_key) (não o método .decrypt() inexistente)
  • O fallback raw-bytes só dispara se BRAIN_REPO_MASTER_KEY está ausente no ambiente (bootstrap não rodou); nesse caso emite WARNING explícito no log.

Confirmei os contratos in-container:

PATAuthProvider.encrypt_token: present
decrypt_token function: present

Commit relevante: ec393f6 fix(brain-repo): correct PATAuthProvider API.

B2 — restore SSE crasha: corrigido em 3aa196b

O call site agora bate a assinatura real:

restore.execute_restore(
    repo_url=repo_url,
    ref=ref,
    token=token,
    install_dir=install_dir,     # WORKSPACE root — era \`local_path\` errado
    include_kb=include_kb,
    kb_key_matches=kb_key_matches,  # novo, vem do body (default False)
)

install_dir é resolvido pra raiz do workspace (onde os SWAP_DIRS são substituídos), não o diretório do clone — essa confusão era a origem do bug. kb_key_matches é um novo parâmetro do body, exposto no frontend como checkbox "I still have the original master key" (default OFF pra que uma chave errada não possa decrypt-envenenar a KB).

Bonus — prevenção de regressão

brain_repo/__init__.py agora roda inspect.signature no import contra as duas APIs críticas (PATAuthProvider.__init__ e execute_restore). Divergência → WARNING no startup log com os params que estão a mais/faltando. Próxima drift aparece no boot, não em produção via fallback inseguro.

Validação in-container:

execute_restore signature: ['include_kb', 'install_dir', 'kb_key_matches', 'ref', 'repo_url', 'token']
PATAuthProvider.encrypt_token: present
ALL API CHECKS PASS

Além dos bloqueadores — melhorias nesta rodada

  • fix(brain-repo): start_brain_watcher agora recebe flask_app como parâmetro em vez de from app import app circular — o log "current Flask app is not registered with this SQLAlchemy instance" em todo boot some, auto-sync finalmente inicia. (f1ed7d8)
  • feat(backups): tabs unificadas Local/S3/Brain Repo na página de backups, com listagem de snapshots (HEAD/milestones/weekly/daily collapsible), Import dropdown com 2 fontes, e modal de restore SSE dedicado ao brain repo. (4bbfa17)
  • feat(backups): last_error visível no card Brain Repo + polling 30s visibility-aware, então falhas do watcher aparecem em tempo real em vez de silenciosamente. (0e8235a)
  • chore(notifications): health-check do terminal-server antes do WS + visibility guard — elimina o spam WebSocket connection failed que aparecia em páginas que não usam terminal. (6e7537f)

Tudo no mesmo branch feat/brain-repo-onboarding. Commits lógicos, push já feito, PR atualizado.

@DavidsonGomes
Copy link
Copy Markdown
Member

Follow-up review — PR #32

@NeritonDias verifiquei os dois fixes no código atual (0e8235a).

B2 — restore SSE kwargs: resolvido ✅

routes/brain_repo.py:527-534 bate 1:1 com restore.py:28-35, install_dir resolvido como workspace root. O _verify_api_compat() no brain_repo/__init__.py como guarda anti-drift é uma boa adição.

B1 — PAT plaintext: caminho feliz resolvido, fallback inseguro permanece 🔴

O happy path agora cripta corretamente via PATAuthProvider.encrypt_token(). Mas o padrão "silent plaintext fallback" — que foi exatamente o vetor do bug original — continua em dois pontos:

  • routes/brain_repo.py:391-395 (connect): se _get_master_key() retorna None, grava encrypted = token.encode("utf-8") com só um log.warning. Token vai pro banco em plaintext.
  • routes/brain_repo.py:222-228 e :232-237 (_decrypt_token): mesma lógica — master key ausente ou ImportErrorconfig.github_token_encrypted.decode("utf-8") e devolve o blob como se fosse o token em claro.

app.py:39-50 auto-gera a chave no boot, então na prática o caminho "sem chave" é raro — mas se o bootstrap falhar por qualquer motivo (permissão de .env, override de env, container sem write), o PAT vai pro banco em plaintext e o usuário não percebe. log.warning não é notado por ninguém em produção.

Fix: trocar os fallbacks por abort(500, description="BRAIN_REPO_MASTER_KEY not configured") em connect, e return "" + log.error em _decrypt_token. Se a chave é pré-requisito do feature, quebrar alto é mais seguro do que gravar plaintext baixo.

🟡 Nota — multi-worker watcher

f1ed7d8 corrigiu o import circular passando flask_app, mas não o cenário gunicorn multi-worker: app.py:640-641 chama start_brain_watcher no import do módulo, então cada worker inicia um observer e o BrainRepoConfig.query.filter_by(sync_enabled=True).first() no watcher pega só o primeiro config. No modelo multi-user que a PR suporta, só um usuário fica com auto-sync ativo. Proteger com lock de leader ou iterar sobre todos os configs habilitados.

Operacional

  • Mergeable = CONFLICTING → precisa rebase em develop antes de sair de draft.

Resumindo: B2 ok, B1 parcial (fallback inseguro ainda precisa cair). Fora isso, só a nota do watcher.

Removes the three silent plaintext fallbacks flagged in PR review — the
exact vector of the original bug.

## connect() — fail loud on write

Previously: master_key missing OR cryptography ImportError fell through
to ``encrypted = token.encode("utf-8")`` with only a log.warning. The
PAT was then written to BrainRepoConfig.github_token_encrypted as
plaintext, the UI said "connected ✓", and sync continued working. Only
way to notice was auditing the database.

Now: both failure modes return 500 with a typed response
``{"error": "...", "code": "CRYPTO_UNAVAILABLE"}`` and escalate to
log.critical. The ``code`` field lets lib/api.ts differentiate this
error from other 500s, so the UI can render a specific actionable
message instead of generic "something went wrong".

Rationale: nobody reads WARNING-level logs in prod. The app.py
bootstrap that auto-generates BRAIN_REPO_MASTER_KEY can fail silently
in several plausible cases (read-only .env mount, cryptography module
partially installed, env var set to empty string by a shell config,
gunicorn worker race during first boot). In any of those, the user
used to store a plaintext credential without knowing. Fail-closed is
the correct security default.

## _decrypt_token() — fail safe on read

Previously: master_key missing OR ImportError fell through to
``config.github_token_encrypted.decode("utf-8")`` — returning the
stored blob verbatim as if it were the plaintext token. Worked only
because the encrypt side had the mirror bug (so blob == plaintext).

Now: every failure path logs at ERROR level and returns "". Every
current caller already treats empty token as hard failure
(``if not token: abort(500, "Could not decrypt stored token")``), so
the behaviour user-visible stays correct and surfaces the issue
instead of pretending the decrypt succeeded.

Response to PR review (davidsoncelestino):
> log.warning não é notado por ninguém em produção
Exactly — everything on this path now logs CRITICAL or ERROR and
refuses the operation instead of accepting plaintext. The stored blob
is never decoded as a literal token.
start_brain_watcher used to capture master_key in the debounce closure
without checking whether it was actually available. If BRAIN_REPO_MASTER_KEY
was missing (the bootstrap failed silently), decrypt_token would crash on
every single file-change event, logging the same error every 30s forever
while the UI kept showing "auto-sync enabled ✓".

Now the watcher refuses to start when master_key is empty, with a CRITICAL
log message. The brain repo config stays connected, manual sync/tag still
work (they abort with "Could not decrypt stored token" which is visible to
the user), but the watcher doesn't spam-fail in the background.

Notable: _initialize_remote_brain_repo does NOT need the same guard — it
only needs the token argument (never encrypts/decrypts), and the upstream
connect() endpoint now fails loud BEFORE calling it when crypto is broken.
…import-time CRITICAL check

Adds visibility so the user (and ops) notice when crypto is broken instead
of only finding out after attempting a sync that fails for mysterious
reasons.

- brain_repo/__init__.py:
  - New _verify_crypto_ready() import-time check. Logs CRITICAL (not
    WARNING) when the cryptography module is missing, BRAIN_REPO_MASTER_KEY
    is unset, or the key format is invalid. Runs once at process start and
    shows up red in production log aggregators — directly addresses review
    feedback: 'log.warning não é notado por ninguém em produção'.
  - New is_crypto_ready() runtime helper exposed for HTTP endpoints.

- routes/brain_repo.py:status: returns crypto_ready field in every shape
  (connected + disconnected). UI uses it to render a danger-state card.

- routes/backups.py:backup_config: returns brain_crypto_ready so the
  /backups destination card can render the same warning inline without
  making a second round-trip to /brain-repo/status.

No UI changes yet — follow-up commit wires Backups.tsx and
settings/BrainRepo.tsx to these fields.
When the backend reports brain_crypto_ready=false / crypto_ready=false the
UI now treats it as the most urgent state:

- Backups destination card: danger border, "encryption broken" red pill
  (takes priority over "sync error"), inline explanation, Reconnect link.
- Settings /brain-repo: prominent banner above the status card explaining
  the exact failure mode and that ALL sync/tag operations will fail until
  an admin restores BRAIN_REPO_MASTER_KEY.

Why a separate state from last_error: cryptoBroken means stored tokens
can't be decrypted at all, so every sync will fail the same way. last_error
is transient (network hiccup, force-push conflict, etc). Users need to
distinguish "transient — retry" from "server-side, contact admin".

i18n: +6 keys across en-US / pt-BR / es (backups.destinations.cryptoBroken,
cryptoBrokenDesc; brainRepoSettings.cryptoBroken.title, .desc).

Closes the last piece of PR review feedback on silent plaintext fallbacks:
 - fallbacks removed (c43aa2f, 7049d1e)
 - CRITICAL logs instead of WARNING (3a12021)
 - User-visible warning so the failure is impossible to miss (this commit)
@NeritonDias
Copy link
Copy Markdown
Contributor Author

NeritonDias commented Apr 24, 2026

@DavidsonGomes concordo com a leitura — o fallback silencioso era exatamente o vetor original e não tinha razão pra persistir. Endereçado em 4 commits (c43aa2f..f01619c). Resumo:

1. connect() — fail loud no write (c43aa2f)

Removido ambos os caminhos que gravavam token.encode("utf-8") (master_key ausente + ImportError do PATAuthProvider). Agora ambos retornam 500 com response tipado:

{ "error": "Token encryption unavailable: ...", "code": "CRYPTO_UNAVAILABLE" }

log.critical ao invés de log.warning — aparece em vermelho em qualquer log aggregator decente. O code permite lib/api.ts diferenciar esse erro de outros 500s.

2. _decrypt_token() — fail safe no read (c43aa2f)

Removidos ambos os .decode("utf-8") do blob armazenado. Cada branch de falha agora:

  • Loga ERROR com motivo específico (master_key ausente / cryptography ausente / Fernet decrypt falhou)
  • Retorna ""

Callers já tratam com if not token: abort(500, "Could not decrypt stored token"), então o comportamento user-facing fica correto e explícito em vez de decrypt falso-positivo.

3. start_brain_watcher guard (7049d1e)

Watcher agora recusa startar se get_master_key() retornar vazio — antes spammava erro de decrypt a cada 30s no closure. log.critical + early return.

4. Import-time check + crypto_ready exposure (3a12021)

brain_repo/__init__.py roda _verify_crypto_ready() no import, emitindo CRITICAL (não WARNING) em 3 cenários: cryptography ausente, BRAIN_REPO_MASTER_KEY não setado, key em formato inválido.

GET /api/brain-repo/status e GET /api/backups/config agora retornam crypto_ready: bool / brain_crypto_ready: bool. A UI usa isso pra renderizar estado de perigo em vez de "connected ✓".

5. UI danger state (f01619c)

  • /backups destination card: border vermelho, pill "encryption broken", banner explicativo, botão Reconnect quando crypto_ready=false.
  • /settings/brain-repo: banner no topo da página com explicação + "NÃO tente novamente até resolver".

Validação in-container

Rodei 4 smoke tests simulando os cenários problemáticos:

TEST 1: master_key present → crypto_ready=True                                ✓
TEST 2: master_key empty → crypto_ready=False + CRITICAL log                  ✓
TEST 3: invalid master_key format → crypto_ready=False + CRITICAL log         ✓
TEST 4: _decrypt_token with missing key → returns '' (not .decode blob) + ERROR log ✓

Stored blob nunca é retornado como se fosse token em claro. Write path nunca grava plaintext. Read path nunca decodifica como plaintext. Falha explícita em todos os caminhos.

Sobre race condition do bootstrap do master_key (2 workers Gunicorn que geram keys diferentes em paralelo) — é um cenário arquitetural que prefiro endereçar em ticket separado (movendo geração pro entrypoint.sh antes do fork). Com o fail-loud atual, o pior caso é 500 em vez de plaintext silencioso, o que é um melhoria relativa forte.

…uards against undefined ref

Clicking the Brain Repo tab turned the page black with
  Cannot read properties of undefined (reading 'replace')
because list_snapshots returned HEAD as {"sha": "..."} with no `ref`
field, while daily/weekly/milestones items had both ref+sha. The
SnapshotRow renderer called s.ref.replace(/^refs\/tags\//, '') on every
item uniformly — HEAD crashed the whole component tree.

Two fixes, defence-in-depth:

1. Backend (brain_repo/github_api.py:list_snapshots): every snapshot
   item now returns the same shape {ref, sha, label}. HEAD uses
   ref="HEAD" / label="HEAD" (not an empty dict), and the caller sees
   `head: null` when no HEAD exists (renamed from {} for clarity).
   `label` is pre-computed by stripping `refs/tags/` so the frontend
   has one string ready to render without parsing git refs.

2. Frontend (Backups.tsx): all .replace calls on snapshot.ref use
   `(snapshot.ref ?? '')` so a future regression of the backend shape
   can't re-black the page. The display string falls back to
   "(unnamed)" when label + ref are both missing.

Validated end-to-end: /api/brain-repo/snapshots now returns
  HEAD: {"ref": "HEAD", "sha": "...", "label": "HEAD"}
  Milestones[0]: {"ref": "refs/tags/milestone/...", "sha": "...", "label": "milestone/..."}
…er update

These changes were made locally in earlier turns but never committed —
they were applied to the working tree after a stash pop during the
upstream merge and slipped past subsequent `git add <specific-files>`
commits.

The callers in routes/brain_repo.py (already committed in c43aa2f and
7049d1e) DO use the new API:

    pushed, push_err = git_ops.push(repo_dir, token, with_tags=True)

But git_ops.push on the branch still had the old signature:

    def push(repo_dir, token) -> bool

So every sync/force and tag/milestone call in the pushed code would
crash with TypeError: push() got an unexpected keyword argument
'with_tags'. Production on the branch was silently broken until now —
smoke tests passed only because I was docker-cp'ing working-tree files
into the container.

Changes now in-tree:

- push(repo_dir, token, with_tags=True) -> tuple[bool, str]. Returns
  (ok, error_message). Uses --follow-tags when with_tags=True so
  annotated tags created by create_tag actually reach GitHub. Masks
  token in stderr output before logging.
- create_tag(repo_dir, tag, msg, force=False). When force=True, uses
  git tag -a -f so re-tagging an existing milestone moves the pointer
  instead of erroring.
- sync_worker._execute_job uses the new tuple return + threads the
  real error into the pending-job JSON.

All callers verified: sync_force, tag_milestone, _initialize_remote_brain_repo
in routes/brain_repo.py already destructure the tuple correctly.
… steps

Adds the EvoNexus logo above the step indicator on every inner wizard
screen so the brand presence is consistent from Welcome through
Step 3 (not just on the first screen).

New component OnboardingHeader renders:
  [logo EVO_NEXUS.webp h-7]
  [Step N of 3 • ● ● ○]

Used on StepProvider (step1of3), StepBrainRepo (step2of3),
StepBrainConnect (step2aOf3), StepBrainChoose (step2bOf3), StepConfirm
(step3of3). Each step file lost its duplicated step-indicator block —
fewer lines overall.

Welcome.tsx keeps its bespoke header with the larger logo and welcome
title — it's the landing screen, different layout by design.

These file changes were applied to the working tree during an earlier
onboarding polish but never committed — they slipped through the
subsequent `git add <specific-files>` batches.
The auto-sync closure (`_sync_fn` in start_brain_watcher) was calling
git_ops.push() but discarding the (ok, error_message) tuple. Symptoms:

1. Push failures never reached BrainRepoConfig.last_error, so the UI
   card kept showing "connected ✓" while auto-sync quietly failed every
   30 seconds in the background. The user only noticed when commits
   stopped appearing on GitHub.

2. Push successes didn't update last_sync, so the "Last sync: 2m ago"
   stayed frozen at the manual-sync timestamp even while the watcher
   was happily pushing every file change.

Fix: capture the tuple, log ERROR on failure, and persist the result
inside `flask_app.app_context()` (same Flask instance the start function
received as a parameter, so the existing circular-import fix from
f1ed7d8 keeps working). The 30s UI polling from 0e8235a now actually
reflects watcher state.

Found during post-merge audit of all `git_ops.push` call sites — every
other caller was destructuring the tuple already; this one slipped.
@NeritonDias NeritonDias marked this pull request as ready for review April 24, 2026 04:27
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @NeritonDias, your pull request is larger than the review limit of 150000 diff characters

@NeritonDias NeritonDias deleted the feat/brain-repo-onboarding branch April 24, 2026 06:09
@NeritonDias NeritonDias restored the feat/brain-repo-onboarding branch April 24, 2026 06:51
DavidsonGomes pushed a commit that referenced this pull request Apr 24, 2026
…ore flow (#43)

* feat(brain-repo): foundation — BrainRepoConfig model, user onboarding columns, BRAIN_REPO_MASTER_KEY, watchdog dep

- Add User.onboarding_state + User.onboarding_completed_agents_visit columns
- Add BrainRepoConfig model (user_id FK, encrypted token, repo metadata, sync state)
- Add needs_onboarding() helper in models.py
- SQLite migration block in app.py (ALTER TABLE pattern)
- BRAIN_REPO_MASTER_KEY auto-generated in .env on first run
- Register onboarding + brain_repo blueprints (try/except — files created in later commits)
- Add /api/auth/needs-onboarding to PUBLIC_PATHS
- Add watchdog>=4.0 to pyproject.toml
- Create dashboard/backend/brain_repo/__init__.py
- Create .handoffs/ with phase-0-handoff, domain-auth, domain-watcher

* feat(onboarding): backend routes — needs-onboarding, onboarding state endpoints

- Updated auth_routes.py import to include BrainRepoConfig and needs_onboarding
- Added GET /api/auth/needs-onboarding (public, handles unauthenticated + pre-setup)
- Added POST /api/auth/mark-agents-visited (login_required)
- Created routes/onboarding.py with state, start, complete, skip, provider endpoints

* feat(brain-repo): brain_repo route — status, connect, detect, snapshots, restore SSE, sync

- GET /api/brain-repo/status — returns config dict or {connected: false}
- POST /api/brain-repo/connect — validate PAT, create/validate repo, encrypt+store token
- POST /api/brain-repo/disconnect — clears token and disables sync
- GET /api/brain-repo/detect — find candidate brain repos for a PAT
- GET /api/brain-repo/snapshots — list daily/weekly/milestone/head refs
- POST /api/brain-repo/restore/start — SSE stream via stream_with_context
- POST /api/brain-repo/sync/force — commit+push+milestone tag
- POST /api/brain-repo/tag/milestone — create named milestone tag
All brain_repo sub-module calls use try/except ImportError for graceful fallback

* feat(brain-repo): github oauth, api, git_ops, watcher, sync_worker, kb_mirror, transcripts, restore, secrets_scanner

* feat(onboarding): frontend wizard — first-time flow, restore flow, Oracle banner, brain-repo settings

* chore(setup.py): brain-repo prompts in CLI + backup github target + /salve skill + .gitignore

- setup.py: add ask_password, ask_choice, _ensure_brain_master_key,
  _save_brain_repo_pat helpers; add Brain Repo wizard block in main()
  after choose_provider() (local mode); add brain_repo_* keys to all
  3 language bundles (en-US, pt-BR, es) in MESSAGES dict
- backup.py: add backup_to_github() function and 'github' --target
  option; dispatches to brain_repo.git_ops for commit+push+tag
- .claude/skills/salve/SKILL.md: new skill that flushes session to
  memory files and relies on Brain Repo file watcher for auto-sync
- .gitignore: add memory/raw-transcripts/
- .handoffs/phase-4-cli-handoff.md: phase summary

* i18n: 13 new brain-repo keys — en-US / pt-BR / es

Add brainRepo namespace with 14 keys to all three locale files:
brain_repo_enable_prompt, brain_repo_auth_method, brain_repo_defer_to_web,
brain_repo_pat_instructions, brain_repo_pat_prompt, brain_repo_pat_saved,
brain_repo_pat_skipped, brain_repo_configure_later, brain_repo_connect,
brain_repo_disconnect, brain_repo_sync_now, brain_repo_create_milestone,
brain_repo_status_connected, brain_repo_status_disconnected.

Key parity verified: all diffs between bundles are empty sets.

* chore: final audit handoff — all checks pass, 7 commits, build clean

* chore: add docker-compose.dev.yml + DEV-SETUP.md for local Windows testing

* fix(dev): Dockerfile.dev with --legacy-peer-deps to fix npm install on Windows Docker

* fix(brain-repo): correct PATAuthProvider API + add /validate-token + bootstrap remote repo

- routes/brain_repo.py: replace broken PATAuthProvider(key).encrypt(token) calls
  with PATAuthProvider(token, key).encrypt_token(); decrypt now uses module-level
  decrypt_token() helper. Handle missing BRAIN_REPO_MASTER_KEY gracefully.
- Add new POST /api/brain-repo/validate-token endpoint that validates a PAT via
  GitHub /user without persisting anything (used by onboarding to check token
  before showing the repo-choice step).
- Convert blueprint abort() calls to JSON via errorhandler so the SPA can show
  the actual backend message instead of a generic '400 BAD REQUEST'.
- Add friendlier error in connect() when the GitHub repo already exists (422 →
  bilingual message guiding the user to 'Use existing' or pick another name).
- New helper _initialize_remote_brain_repo: after create_private_repo creates an
  empty repo on GitHub, clone-init it locally, drop in the brain-repo skeleton
  via manifest.initialize_brain_repo (.evo-brain marker, manifest.yaml, dirs,
  README, .gitignore), commit and push to origin/main. Without this the repo
  stays empty and detect_brain_repos (code search by .evo-brain) cannot find
  it later — UI rejected such repos as 'incompatible'.
- brain_repo/github_api.py: add get_repo_info(token, repo_url) that parses a
  GitHub URL, fetches the repo, and returns (is_private, info_dict). Replaces
  the broken validate_repo_is_private(token, repo_url) call site that expected
  a tuple but the underlying function returned bool and took 3 args.
- brain_repo/pat_auth.py: re-export shim so existing routes/brain_repo.py
  imports of brain_repo.pat_auth.PATAuthProvider keep working (the actual
  implementation lives in github_oauth.py).

* fix(onboarding): write canonical providers.json schema + boot-time migration

- routes/onboarding.py: rewrite set_provider so it never writes the legacy
  {<id>: {api_key, enabled}} shape that broke routes/providers.py. Now reuses
  _read_config / _write_config / ALLOWED_ENV_VARS from routes.providers and
  produces the canonical {active_provider, providers: {<id>: {cli_command,
  env_vars, ...}}} schema. Anthropic just flips active_provider (no key);
  OpenAI/OpenRouter merge filtered env_vars; codex / codex_auth is a no-op
  that defers to the existing /api/providers/openai OAuth endpoints.
- Env vars outside ALLOWED_ENV_VARS or containing shell metacharacters are
  silently dropped — same policy as routes/providers.py._sanitize_env_vars.
- app.py: add boot-time providers.json schema-normalization migration.
  If the file exists but is missing active_provider or providers keys, the
  app overwrites it from providers.example.json. Recovers from the broken
  state left by older versions of the onboarding wizard without requiring
  the user to manually reset volumes.

* fix(api): expose backend JSON error.description in lib/api.ts

The api helper used to throw 'new Error("400 BAD REQUEST")' for any non-OK
response, dropping the actual server message — so the UI showed a useless
generic status instead of the specific reason ('repo already exists',
'token required', etc).

New buildError() reads the response body, prefers JSON {error|description|
message} fields, falls back to short text, and prefixes with the status so
existing callers that pattern-match on '401' / '403' keep working.

* fix(onboarding): refreshUser() before navigate to prevent /onboarding redirect loop

Two bugs were producing the same symptom — wizard kept reloading instead of
landing on /agents after completion:

1. App.tsx guard ignored onboarding_state === null. New users have NULL by
   default in the column, so they bypassed the guard and reached /agents
   without ever seeing the wizard. Guard now treats null as 'needs onboarding'
   (only completed/skipped count as done).

2. OnboardingRouter handleComplete/handleSkip called POST /onboarding/complete
   then navigated to /agents — but the React user state still had
   onboarding_state='pending' cached from /auth/me. The guard would re-read
   that cached value and bounce back to /onboarding, where the useEffect saw
   the fresh 'completed' state from the backend and tried to navigate again.
   Infinite loop. Fix: await refreshUser() (from AuthContext) before navigate
   so the guard sees the current state.

* feat(onboarding): per-provider sub-flows + use /validate-token instead of /connect

StepProvider was a single generic API-key form for all 4 providers — but
Anthropic, OpenAI, OpenRouter and Codex each need a different flow:

- Anthropic: no API key, user logs in by running 'claude' in the terminal at
  the Agents page. Sub-step shows that instruction + flips active_provider.
- OpenAI: API key input → POST /providers/openai/models validates and lists
  available models → dropdown to pick model → save (env_vars merge).
- OpenRouter: same shape as OpenAI but with read-only Base URL chip
  (https://openrouter.ai/api/v1) and free-text model field.
- Codex: two tabs (Browser OAuth, Device Auth) replicating the Providers.tsx
  modal — calls /providers/openai/auth-start|auth-complete or device-start|
  device-poll. Tab/button colors aligned to the onboarding palette (#00FFA7).

StepBrainConnect.tsx and RestoreSelectRepo.tsx now call POST
/api/brain-repo/validate-token (validates without persisting) instead of
/connect (which requires repo_url|create_repo and was 400-ing on token-only
calls). RestoreSelectRepo also passes the token to /detect as a query param
so the backend doesn't need a stored config to enumerate brain repos.

i18n keys for these screens added in the same pass since the components
were rewritten.

* i18n(onboarding): translate 12 wizard screens + Oracle banner — ~150 keys

The whole onboarding wizard (7 first-time screens + 5 restore screens) and
the OracleWelcomeBanner were hardcoded English. Now wired through useTranslation
with new namespaces:

- onboarding.welcome / .stepIndicator / .provider / .brainRepo / .connect /
  .choose / .confirm / .providerLabels / .providerAnthropic / .providerOpenAI /
  .providerOpenRouter / .providerCodex / .backToProviders
- restore.selectRepo / .selectSnapshot / .confirm / .execute
- agents.welcomeBanner

All 3 bundles (en-US, pt-BR, es) have the identical key tree with natural
translations — no English strings reused as fallbacks in pt-BR / es.

* fix(dev): Dockerfile.dev installs claude-code/openclaude + fix CRLF in scripts

- Install @anthropic-ai/claude-code and @gitlawb/openclaude@latest as a global
  npm packages (matches Dockerfile.swarm.dashboard). Without these, the
  embedded terminal at /agents and the /providers page report 'CLI missing'.
- Run sed -i 's/\r//' on entrypoint.sh and start-dashboard.sh after COPY so
  Windows CRLF line endings don't crash the container with
  'env: bash\r: No such file or directory' on first boot.
- Update header comment to reflect that the dev image now installs both CLIs.

* fix(brain-repo): correct execute_restore kwargs + import-time API contract check

## Bug (flagged in PR #32 review by @davidsoncelestino)

routes/brain_repo.py:restore_start called execute_restore with keyword args
that don't match the real signature in brain_repo/restore.py. The call
passed `local_path` (not a parameter of that function) and omitted two
required args — `install_dir` and `kb_key_matches`.

A signature mismatch on Python keyword-only calls raises TypeError before
the generator emits any events. Inside the SSE `generate()` it was caught
by the generic `except Exception` branch and turned into a vague error
event, so the feature was 100% broken but the failure mode looked like a
transient restore error rather than an API mismatch.

## Fix

- Call execute_restore with the actual signature:
  (repo_url, ref, token, install_dir, include_kb, kb_key_matches).
- install_dir is resolved to WORKSPACE (project root) — that's where the
  SWAP_DIRS (memory/workspace/customizations/config-safe) are replaced.
  It is NOT the brain-repo clone path, which was the old confusion.
- kb_key_matches is a new body parameter (default False = safe). The
  upcoming frontend restore modal will expose it as a checkbox
  "I still have the original master key". When False, KB import silently
  degrades to metadata-only inside execute_restore (existing behaviour).

## Prevention

Added brain_repo/__init__.py import-time check that runs
inspect.signature on PATAuthProvider.__init__ and execute_restore, logs
a WARNING if they drift. Next time routes and brain_repo modules go out
of sync the mismatch appears in startup logs instead of silently falling
into an `except ImportError` that stored PAT tokens as plaintext (the
exact bypass the reviewer flagged separately).

* fix(brain-repo): start_brain_watcher accepts flask_app, no more circular import

The watcher previously did `from app import app` inside its own startup
function, which fires during app.py module load — SQLAlchemy's init_app
hadn't completed yet, so the half-initialised db instance was the "wrong"
one from the watcher's perspective. Every boot logged:

    start_brain_watcher failed: The current Flask app is not registered
    with this 'SQLAlchemy' instance

and auto-sync stayed permanently off. Users had to hit Sync manually.

Fix:
- start_brain_watcher now accepts `flask_app` as a parameter. app.py
  passes the already-fully-initialised `app` instance when it calls it.
- Legacy fallback preserved: if nobody passes flask_app, it still tries
  the old import (logs error, returns None) so out-of-tree callers keep
  working.
- Bonus: watcher's auto-sync closure now mirrors workspace → brain_repo
  before commit (using _sync_workspace_to_brain_repo from routes). Before,
  auto-sync would commit an empty brain_repo_dir because nobody copied
  the workspace into it — same root cause as the manual Sync bug fixed
  earlier but silently swallowed by the daemon.

* chore(notifications): health-check before WS + visibility guard

Fixes the repeating console noise "WebSocket connection to
'ws://localhost:32352/ws' failed" that some users see on pages that
don't use the terminal at all (e.g. /backups).

Root cause: useGlobalNotifications is mounted once in the app shell and
opens a WS to the terminal-server for agent-finished notifications. When
:32352 is unreachable the ws.onerror → ws.close → ws.onclose chain
triggered reconnect with exponential backoff capped at 30s — but every
failed attempt wrote an error line to the console, so users saw the
message every 1-30s forever.

Two fixes:

1. HTTP health-probe at TS_HTTP/api/health before opening the WS. If it
   fails, back off to 30s and log ONCE ("terminal-server unreachable —
   notifications disabled"). Skips the WS entirely so the browser never
   emits the native "WebSocket connection failed" error.

2. Visibility guard: when the tab is hidden, stop trying. Resume as soon
   as the user focuses the tab. Saves CPU + removes noise while multi-
   tabbing.

Notifications behaviour is unchanged when the server is up — the first
health check passes, the WS opens as before.

* feat(backups): unified tabs (local/s3/brain-repo) + import dropdown + SSE restore modal

Replaces the stacked "Local list + S3 list below" layout with three explicit
tabs: Local / S3 / Brain Repo. Each tab has its own count badge, empty
state, refresh button and table. The Brain Repo tab groups snapshots into
4 sections (HEAD, manual milestones, weekly, daily) — weekly/daily
collapsed by default because they can hit 30+ entries each.

Import button is now a dropdown with 2 sources:
- Upload .zip from disk (existing behaviour)
- Pull snapshot from Brain Repo (new — switches to the Brain tab)

Brain Repo snapshots get their own restore modal, distinct from the
Local/S3 merge|replace flow because semantics differ:
- Warning upfront that brain restore does NOT recover the DB or files
  outside memory/workspace/customizations/config-safe.
- "Include kb-mirror/" checkbox (off by default).
- "I still have the original master key" checkbox, only visible when
  include_kb is on. Maps to the kb_key_matches body param from the
  earlier backend fix — defaults OFF so a wrong key can't decrypt-poison
  the KB.
- SSE progress bar (~10 steps) via POST /api/brain-repo/restore/start.

All text wired through useTranslation — ~40 new keys across en-US,
pt-BR and es under backups.tabs, .s3Tab, .brainTab, .brainSnapshots,
.importMenu, .brainRestore, plus viewOnGithub action.

Lazy-fetches /api/brain-repo/snapshots only when the Brain tab is
opened AND the brain repo is configured (avoids hitting the GitHub
API needlessly on every /backups visit).

* feat(backups): surface brain-repo last_error + 30s visibility-aware polling

When the watcher auto-sync fails (token expired, remote gone, network
flap) backend writes the reason to BrainRepoConfig.last_error — but the
UI used to ignore it silently. Users had no signal that sync was broken
except noticing commits not appearing on GitHub.

Now the Brain Repo destination card:
- Switches border to danger red when last_error is set.
- Replaces the "connected" pill with a "sync error" pill.
- Shows the error message verbatim in a tight banner below the repo URL.
- Adds a "Reconnect" pill button that routes back through /onboarding
  (clears the stored config + walks through PAT + repo selection again).

Polling: a 30s setInterval on /api/backups/config while the user is on
/backups AND brain is connected. Gated by document.visibilityState so it
pauses in background tabs (saves CPU + GitHub rate limit). Lets the user
see last_error clear within 30s after a successful manual Sync without
requiring a full page refresh.

i18n: +2 keys (syncError, reconnect) across en-US, pt-BR, es.

* fix(brain-repo): fail loud on missing crypto — no plaintext PAT fallback

Removes the three silent plaintext fallbacks flagged in PR review — the
exact vector of the original bug.

## connect() — fail loud on write

Previously: master_key missing OR cryptography ImportError fell through
to ``encrypted = token.encode("utf-8")`` with only a log.warning. The
PAT was then written to BrainRepoConfig.github_token_encrypted as
plaintext, the UI said "connected ✓", and sync continued working. Only
way to notice was auditing the database.

Now: both failure modes return 500 with a typed response
``{"error": "...", "code": "CRYPTO_UNAVAILABLE"}`` and escalate to
log.critical. The ``code`` field lets lib/api.ts differentiate this
error from other 500s, so the UI can render a specific actionable
message instead of generic "something went wrong".

Rationale: nobody reads WARNING-level logs in prod. The app.py
bootstrap that auto-generates BRAIN_REPO_MASTER_KEY can fail silently
in several plausible cases (read-only .env mount, cryptography module
partially installed, env var set to empty string by a shell config,
gunicorn worker race during first boot). In any of those, the user
used to store a plaintext credential without knowing. Fail-closed is
the correct security default.

## _decrypt_token() — fail safe on read

Previously: master_key missing OR ImportError fell through to
``config.github_token_encrypted.decode("utf-8")`` — returning the
stored blob verbatim as if it were the plaintext token. Worked only
because the encrypt side had the mirror bug (so blob == plaintext).

Now: every failure path logs at ERROR level and returns "". Every
current caller already treats empty token as hard failure
(``if not token: abort(500, "Could not decrypt stored token")``), so
the behaviour user-visible stays correct and surfaces the issue
instead of pretending the decrypt succeeded.

Response to PR review (davidsoncelestino):
> log.warning não é notado por ninguém em produção
Exactly — everything on this path now logs CRITICAL or ERROR and
refuses the operation instead of accepting plaintext. The stored blob
is never decoded as a literal token.

* fix(brain-repo): guard master_key in watcher startup — fail loud

start_brain_watcher used to capture master_key in the debounce closure
without checking whether it was actually available. If BRAIN_REPO_MASTER_KEY
was missing (the bootstrap failed silently), decrypt_token would crash on
every single file-change event, logging the same error every 30s forever
while the UI kept showing "auto-sync enabled ✓".

Now the watcher refuses to start when master_key is empty, with a CRITICAL
log message. The brain repo config stays connected, manual sync/tag still
work (they abort with "Could not decrypt stored token" which is visible to
the user), but the watcher doesn't spam-fail in the background.

Notable: _initialize_remote_brain_repo does NOT need the same guard — it
only needs the token argument (never encrypts/decrypts), and the upstream
connect() endpoint now fails loud BEFORE calling it when crypto is broken.

* feat(brain-repo): expose crypto_ready in /status + /backups/config + import-time CRITICAL check

Adds visibility so the user (and ops) notice when crypto is broken instead
of only finding out after attempting a sync that fails for mysterious
reasons.

- brain_repo/__init__.py:
  - New _verify_crypto_ready() import-time check. Logs CRITICAL (not
    WARNING) when the cryptography module is missing, BRAIN_REPO_MASTER_KEY
    is unset, or the key format is invalid. Runs once at process start and
    shows up red in production log aggregators — directly addresses review
    feedback: 'log.warning não é notado por ninguém em produção'.
  - New is_crypto_ready() runtime helper exposed for HTTP endpoints.

- routes/brain_repo.py:status: returns crypto_ready field in every shape
  (connected + disconnected). UI uses it to render a danger-state card.

- routes/backups.py:backup_config: returns brain_crypto_ready so the
  /backups destination card can render the same warning inline without
  making a second round-trip to /brain-repo/status.

No UI changes yet — follow-up commit wires Backups.tsx and
settings/BrainRepo.tsx to these fields.

* feat(backups,settings): surface crypto_ready — danger state + Reconnect

When the backend reports brain_crypto_ready=false / crypto_ready=false the
UI now treats it as the most urgent state:

- Backups destination card: danger border, "encryption broken" red pill
  (takes priority over "sync error"), inline explanation, Reconnect link.
- Settings /brain-repo: prominent banner above the status card explaining
  the exact failure mode and that ALL sync/tag operations will fail until
  an admin restores BRAIN_REPO_MASTER_KEY.

Why a separate state from last_error: cryptoBroken means stored tokens
can't be decrypted at all, so every sync will fail the same way. last_error
is transient (network hiccup, force-push conflict, etc). Users need to
distinguish "transient — retry" from "server-side, contact admin".

i18n: +6 keys across en-US / pt-BR / es (backups.destinations.cryptoBroken,
cryptoBrokenDesc; brainRepoSettings.cryptoBroken.title, .desc).

Closes the last piece of PR review feedback on silent plaintext fallbacks:
 - fallbacks removed (c43aa2f, 7049d1e)
 - CRITICAL logs instead of WARNING (3a12021)
 - User-visible warning so the failure is impossible to miss (this commit)

* fix(backups): /brain-repo/snapshots returns uniform shape; frontend guards against undefined ref

Clicking the Brain Repo tab turned the page black with
  Cannot read properties of undefined (reading 'replace')
because list_snapshots returned HEAD as {"sha": "..."} with no `ref`
field, while daily/weekly/milestones items had both ref+sha. The
SnapshotRow renderer called s.ref.replace(/^refs\/tags\//, '') on every
item uniformly — HEAD crashed the whole component tree.

Two fixes, defence-in-depth:

1. Backend (brain_repo/github_api.py:list_snapshots): every snapshot
   item now returns the same shape {ref, sha, label}. HEAD uses
   ref="HEAD" / label="HEAD" (not an empty dict), and the caller sees
   `head: null` when no HEAD exists (renamed from {} for clarity).
   `label` is pre-computed by stripping `refs/tags/` so the frontend
   has one string ready to render without parsing git refs.

2. Frontend (Backups.tsx): all .replace calls on snapshot.ref use
   `(snapshot.ref ?? '')` so a future regression of the backend shape
   can't re-black the page. The display string falls back to
   "(unnamed)" when label + ref are both missing.

Validated end-to-end: /api/brain-repo/snapshots now returns
  HEAD: {"ref": "HEAD", "sha": "...", "label": "HEAD"}
  Milestones[0]: {"ref": "refs/tags/milestone/...", "sha": "...", "label": "milestone/..."}

* fix(brain-repo): commit the git_ops.push tuple API + sync_worker caller update

These changes were made locally in earlier turns but never committed —
they were applied to the working tree after a stash pop during the
upstream merge and slipped past subsequent `git add <specific-files>`
commits.

The callers in routes/brain_repo.py (already committed in c43aa2f and
7049d1e) DO use the new API:

    pushed, push_err = git_ops.push(repo_dir, token, with_tags=True)

But git_ops.push on the branch still had the old signature:

    def push(repo_dir, token) -> bool

So every sync/force and tag/milestone call in the pushed code would
crash with TypeError: push() got an unexpected keyword argument
'with_tags'. Production on the branch was silently broken until now —
smoke tests passed only because I was docker-cp'ing working-tree files
into the container.

Changes now in-tree:

- push(repo_dir, token, with_tags=True) -> tuple[bool, str]. Returns
  (ok, error_message). Uses --follow-tags when with_tags=True so
  annotated tags created by create_tag actually reach GitHub. Masks
  token in stderr output before logging.
- create_tag(repo_dir, tag, msg, force=False). When force=True, uses
  git tag -a -f so re-tagging an existing milestone moves the pointer
  instead of erroring.
- sync_worker._execute_job uses the new tuple return + threads the
  real error into the pending-job JSON.

All callers verified: sync_force, tag_milestone, _initialize_remote_brain_repo
in routes/brain_repo.py already destructure the tuple correctly.

* feat(onboarding): shared OnboardingHeader with EvoNexus logo on inner steps

Adds the EvoNexus logo above the step indicator on every inner wizard
screen so the brand presence is consistent from Welcome through
Step 3 (not just on the first screen).

New component OnboardingHeader renders:
  [logo EVO_NEXUS.webp h-7]
  [Step N of 3 • ● ● ○]

Used on StepProvider (step1of3), StepBrainRepo (step2of3),
StepBrainConnect (step2aOf3), StepBrainChoose (step2bOf3), StepConfirm
(step3of3). Each step file lost its duplicated step-indicator block —
fewer lines overall.

Welcome.tsx keeps its bespoke header with the larger logo and welcome
title — it's the landing screen, different layout by design.

These file changes were applied to the working tree during an earlier
onboarding polish but never committed — they slipped through the
subsequent `git add <specific-files>` batches.

* fix(brain-repo): watcher persists push result to last_error / last_sync

The auto-sync closure (`_sync_fn` in start_brain_watcher) was calling
git_ops.push() but discarding the (ok, error_message) tuple. Symptoms:

1. Push failures never reached BrainRepoConfig.last_error, so the UI
   card kept showing "connected ✓" while auto-sync quietly failed every
   30 seconds in the background. The user only noticed when commits
   stopped appearing on GitHub.

2. Push successes didn't update last_sync, so the "Last sync: 2m ago"
   stayed frozen at the manual-sync timestamp even while the watcher
   was happily pushing every file change.

Fix: capture the tuple, log ERROR on failure, and persist the result
inside `flask_app.app_context()` (same Flask instance the start function
received as a parameter, so the existing circular-import fix from
f1ed7d8 keeps working). The 30s UI polling from 0e8235a now actually
reflects watcher state.

Found during post-merge audit of all `git_ops.push` call sites — every
other caller was destructuring the tuple already; this one slipped.
jbmendonca pushed a commit to jbmendonca/evo-nexus that referenced this pull request May 15, 2026
…ore flow (evolution-foundation#43)

* feat(brain-repo): foundation — BrainRepoConfig model, user onboarding columns, BRAIN_REPO_MASTER_KEY, watchdog dep

- Add User.onboarding_state + User.onboarding_completed_agents_visit columns
- Add BrainRepoConfig model (user_id FK, encrypted token, repo metadata, sync state)
- Add needs_onboarding() helper in models.py
- SQLite migration block in app.py (ALTER TABLE pattern)
- BRAIN_REPO_MASTER_KEY auto-generated in .env on first run
- Register onboarding + brain_repo blueprints (try/except — files created in later commits)
- Add /api/auth/needs-onboarding to PUBLIC_PATHS
- Add watchdog>=4.0 to pyproject.toml
- Create dashboard/backend/brain_repo/__init__.py
- Create .handoffs/ with phase-0-handoff, domain-auth, domain-watcher

* feat(onboarding): backend routes — needs-onboarding, onboarding state endpoints

- Updated auth_routes.py import to include BrainRepoConfig and needs_onboarding
- Added GET /api/auth/needs-onboarding (public, handles unauthenticated + pre-setup)
- Added POST /api/auth/mark-agents-visited (login_required)
- Created routes/onboarding.py with state, start, complete, skip, provider endpoints

* feat(brain-repo): brain_repo route — status, connect, detect, snapshots, restore SSE, sync

- GET /api/brain-repo/status — returns config dict or {connected: false}
- POST /api/brain-repo/connect — validate PAT, create/validate repo, encrypt+store token
- POST /api/brain-repo/disconnect — clears token and disables sync
- GET /api/brain-repo/detect — find candidate brain repos for a PAT
- GET /api/brain-repo/snapshots — list daily/weekly/milestone/head refs
- POST /api/brain-repo/restore/start — SSE stream via stream_with_context
- POST /api/brain-repo/sync/force — commit+push+milestone tag
- POST /api/brain-repo/tag/milestone — create named milestone tag
All brain_repo sub-module calls use try/except ImportError for graceful fallback

* feat(brain-repo): github oauth, api, git_ops, watcher, sync_worker, kb_mirror, transcripts, restore, secrets_scanner

* feat(onboarding): frontend wizard — first-time flow, restore flow, Oracle banner, brain-repo settings

* chore(setup.py): brain-repo prompts in CLI + backup github target + /salve skill + .gitignore

- setup.py: add ask_password, ask_choice, _ensure_brain_master_key,
  _save_brain_repo_pat helpers; add Brain Repo wizard block in main()
  after choose_provider() (local mode); add brain_repo_* keys to all
  3 language bundles (en-US, pt-BR, es) in MESSAGES dict
- backup.py: add backup_to_github() function and 'github' --target
  option; dispatches to brain_repo.git_ops for commit+push+tag
- .claude/skills/salve/SKILL.md: new skill that flushes session to
  memory files and relies on Brain Repo file watcher for auto-sync
- .gitignore: add memory/raw-transcripts/
- .handoffs/phase-4-cli-handoff.md: phase summary

* i18n: 13 new brain-repo keys — en-US / pt-BR / es

Add brainRepo namespace with 14 keys to all three locale files:
brain_repo_enable_prompt, brain_repo_auth_method, brain_repo_defer_to_web,
brain_repo_pat_instructions, brain_repo_pat_prompt, brain_repo_pat_saved,
brain_repo_pat_skipped, brain_repo_configure_later, brain_repo_connect,
brain_repo_disconnect, brain_repo_sync_now, brain_repo_create_milestone,
brain_repo_status_connected, brain_repo_status_disconnected.

Key parity verified: all diffs between bundles are empty sets.

* chore: final audit handoff — all checks pass, 7 commits, build clean

* chore: add docker-compose.dev.yml + DEV-SETUP.md for local Windows testing

* fix(dev): Dockerfile.dev with --legacy-peer-deps to fix npm install on Windows Docker

* fix(brain-repo): correct PATAuthProvider API + add /validate-token + bootstrap remote repo

- routes/brain_repo.py: replace broken PATAuthProvider(key).encrypt(token) calls
  with PATAuthProvider(token, key).encrypt_token(); decrypt now uses module-level
  decrypt_token() helper. Handle missing BRAIN_REPO_MASTER_KEY gracefully.
- Add new POST /api/brain-repo/validate-token endpoint that validates a PAT via
  GitHub /user without persisting anything (used by onboarding to check token
  before showing the repo-choice step).
- Convert blueprint abort() calls to JSON via errorhandler so the SPA can show
  the actual backend message instead of a generic '400 BAD REQUEST'.
- Add friendlier error in connect() when the GitHub repo already exists (422 →
  bilingual message guiding the user to 'Use existing' or pick another name).
- New helper _initialize_remote_brain_repo: after create_private_repo creates an
  empty repo on GitHub, clone-init it locally, drop in the brain-repo skeleton
  via manifest.initialize_brain_repo (.evo-brain marker, manifest.yaml, dirs,
  README, .gitignore), commit and push to origin/main. Without this the repo
  stays empty and detect_brain_repos (code search by .evo-brain) cannot find
  it later — UI rejected such repos as 'incompatible'.
- brain_repo/github_api.py: add get_repo_info(token, repo_url) that parses a
  GitHub URL, fetches the repo, and returns (is_private, info_dict). Replaces
  the broken validate_repo_is_private(token, repo_url) call site that expected
  a tuple but the underlying function returned bool and took 3 args.
- brain_repo/pat_auth.py: re-export shim so existing routes/brain_repo.py
  imports of brain_repo.pat_auth.PATAuthProvider keep working (the actual
  implementation lives in github_oauth.py).

* fix(onboarding): write canonical providers.json schema + boot-time migration

- routes/onboarding.py: rewrite set_provider so it never writes the legacy
  {<id>: {api_key, enabled}} shape that broke routes/providers.py. Now reuses
  _read_config / _write_config / ALLOWED_ENV_VARS from routes.providers and
  produces the canonical {active_provider, providers: {<id>: {cli_command,
  env_vars, ...}}} schema. Anthropic just flips active_provider (no key);
  OpenAI/OpenRouter merge filtered env_vars; codex / codex_auth is a no-op
  that defers to the existing /api/providers/openai OAuth endpoints.
- Env vars outside ALLOWED_ENV_VARS or containing shell metacharacters are
  silently dropped — same policy as routes/providers.py._sanitize_env_vars.
- app.py: add boot-time providers.json schema-normalization migration.
  If the file exists but is missing active_provider or providers keys, the
  app overwrites it from providers.example.json. Recovers from the broken
  state left by older versions of the onboarding wizard without requiring
  the user to manually reset volumes.

* fix(api): expose backend JSON error.description in lib/api.ts

The api helper used to throw 'new Error("400 BAD REQUEST")' for any non-OK
response, dropping the actual server message — so the UI showed a useless
generic status instead of the specific reason ('repo already exists',
'token required', etc).

New buildError() reads the response body, prefers JSON {error|description|
message} fields, falls back to short text, and prefixes with the status so
existing callers that pattern-match on '401' / '403' keep working.

* fix(onboarding): refreshUser() before navigate to prevent /onboarding redirect loop

Two bugs were producing the same symptom — wizard kept reloading instead of
landing on /agents after completion:

1. App.tsx guard ignored onboarding_state === null. New users have NULL by
   default in the column, so they bypassed the guard and reached /agents
   without ever seeing the wizard. Guard now treats null as 'needs onboarding'
   (only completed/skipped count as done).

2. OnboardingRouter handleComplete/handleSkip called POST /onboarding/complete
   then navigated to /agents — but the React user state still had
   onboarding_state='pending' cached from /auth/me. The guard would re-read
   that cached value and bounce back to /onboarding, where the useEffect saw
   the fresh 'completed' state from the backend and tried to navigate again.
   Infinite loop. Fix: await refreshUser() (from AuthContext) before navigate
   so the guard sees the current state.

* feat(onboarding): per-provider sub-flows + use /validate-token instead of /connect

StepProvider was a single generic API-key form for all 4 providers — but
Anthropic, OpenAI, OpenRouter and Codex each need a different flow:

- Anthropic: no API key, user logs in by running 'claude' in the terminal at
  the Agents page. Sub-step shows that instruction + flips active_provider.
- OpenAI: API key input → POST /providers/openai/models validates and lists
  available models → dropdown to pick model → save (env_vars merge).
- OpenRouter: same shape as OpenAI but with read-only Base URL chip
  (https://openrouter.ai/api/v1) and free-text model field.
- Codex: two tabs (Browser OAuth, Device Auth) replicating the Providers.tsx
  modal — calls /providers/openai/auth-start|auth-complete or device-start|
  device-poll. Tab/button colors aligned to the onboarding palette (#00FFA7).

StepBrainConnect.tsx and RestoreSelectRepo.tsx now call POST
/api/brain-repo/validate-token (validates without persisting) instead of
/connect (which requires repo_url|create_repo and was 400-ing on token-only
calls). RestoreSelectRepo also passes the token to /detect as a query param
so the backend doesn't need a stored config to enumerate brain repos.

i18n keys for these screens added in the same pass since the components
were rewritten.

* i18n(onboarding): translate 12 wizard screens + Oracle banner — ~150 keys

The whole onboarding wizard (7 first-time screens + 5 restore screens) and
the OracleWelcomeBanner were hardcoded English. Now wired through useTranslation
with new namespaces:

- onboarding.welcome / .stepIndicator / .provider / .brainRepo / .connect /
  .choose / .confirm / .providerLabels / .providerAnthropic / .providerOpenAI /
  .providerOpenRouter / .providerCodex / .backToProviders
- restore.selectRepo / .selectSnapshot / .confirm / .execute
- agents.welcomeBanner

All 3 bundles (en-US, pt-BR, es) have the identical key tree with natural
translations — no English strings reused as fallbacks in pt-BR / es.

* fix(dev): Dockerfile.dev installs claude-code/openclaude + fix CRLF in scripts

- Install @anthropic-ai/claude-code and @gitlawb/openclaude@latest as a global
  npm packages (matches Dockerfile.swarm.dashboard). Without these, the
  embedded terminal at /agents and the /providers page report 'CLI missing'.
- Run sed -i 's/\r//' on entrypoint.sh and start-dashboard.sh after COPY so
  Windows CRLF line endings don't crash the container with
  'env: bash\r: No such file or directory' on first boot.
- Update header comment to reflect that the dev image now installs both CLIs.

* fix(brain-repo): correct execute_restore kwargs + import-time API contract check

## Bug (flagged in PR evolution-foundation#32 review by @davidsoncelestino)

routes/brain_repo.py:restore_start called execute_restore with keyword args
that don't match the real signature in brain_repo/restore.py. The call
passed `local_path` (not a parameter of that function) and omitted two
required args — `install_dir` and `kb_key_matches`.

A signature mismatch on Python keyword-only calls raises TypeError before
the generator emits any events. Inside the SSE `generate()` it was caught
by the generic `except Exception` branch and turned into a vague error
event, so the feature was 100% broken but the failure mode looked like a
transient restore error rather than an API mismatch.

## Fix

- Call execute_restore with the actual signature:
  (repo_url, ref, token, install_dir, include_kb, kb_key_matches).
- install_dir is resolved to WORKSPACE (project root) — that's where the
  SWAP_DIRS (memory/workspace/customizations/config-safe) are replaced.
  It is NOT the brain-repo clone path, which was the old confusion.
- kb_key_matches is a new body parameter (default False = safe). The
  upcoming frontend restore modal will expose it as a checkbox
  "I still have the original master key". When False, KB import silently
  degrades to metadata-only inside execute_restore (existing behaviour).

## Prevention

Added brain_repo/__init__.py import-time check that runs
inspect.signature on PATAuthProvider.__init__ and execute_restore, logs
a WARNING if they drift. Next time routes and brain_repo modules go out
of sync the mismatch appears in startup logs instead of silently falling
into an `except ImportError` that stored PAT tokens as plaintext (the
exact bypass the reviewer flagged separately).

* fix(brain-repo): start_brain_watcher accepts flask_app, no more circular import

The watcher previously did `from app import app` inside its own startup
function, which fires during app.py module load — SQLAlchemy's init_app
hadn't completed yet, so the half-initialised db instance was the "wrong"
one from the watcher's perspective. Every boot logged:

    start_brain_watcher failed: The current Flask app is not registered
    with this 'SQLAlchemy' instance

and auto-sync stayed permanently off. Users had to hit Sync manually.

Fix:
- start_brain_watcher now accepts `flask_app` as a parameter. app.py
  passes the already-fully-initialised `app` instance when it calls it.
- Legacy fallback preserved: if nobody passes flask_app, it still tries
  the old import (logs error, returns None) so out-of-tree callers keep
  working.
- Bonus: watcher's auto-sync closure now mirrors workspace → brain_repo
  before commit (using _sync_workspace_to_brain_repo from routes). Before,
  auto-sync would commit an empty brain_repo_dir because nobody copied
  the workspace into it — same root cause as the manual Sync bug fixed
  earlier but silently swallowed by the daemon.

* chore(notifications): health-check before WS + visibility guard

Fixes the repeating console noise "WebSocket connection to
'ws://localhost:32352/ws' failed" that some users see on pages that
don't use the terminal at all (e.g. /backups).

Root cause: useGlobalNotifications is mounted once in the app shell and
opens a WS to the terminal-server for agent-finished notifications. When
:32352 is unreachable the ws.onerror → ws.close → ws.onclose chain
triggered reconnect with exponential backoff capped at 30s — but every
failed attempt wrote an error line to the console, so users saw the
message every 1-30s forever.

Two fixes:

1. HTTP health-probe at TS_HTTP/api/health before opening the WS. If it
   fails, back off to 30s and log ONCE ("terminal-server unreachable —
   notifications disabled"). Skips the WS entirely so the browser never
   emits the native "WebSocket connection failed" error.

2. Visibility guard: when the tab is hidden, stop trying. Resume as soon
   as the user focuses the tab. Saves CPU + removes noise while multi-
   tabbing.

Notifications behaviour is unchanged when the server is up — the first
health check passes, the WS opens as before.

* feat(backups): unified tabs (local/s3/brain-repo) + import dropdown + SSE restore modal

Replaces the stacked "Local list + S3 list below" layout with three explicit
tabs: Local / S3 / Brain Repo. Each tab has its own count badge, empty
state, refresh button and table. The Brain Repo tab groups snapshots into
4 sections (HEAD, manual milestones, weekly, daily) — weekly/daily
collapsed by default because they can hit 30+ entries each.

Import button is now a dropdown with 2 sources:
- Upload .zip from disk (existing behaviour)
- Pull snapshot from Brain Repo (new — switches to the Brain tab)

Brain Repo snapshots get their own restore modal, distinct from the
Local/S3 merge|replace flow because semantics differ:
- Warning upfront that brain restore does NOT recover the DB or files
  outside memory/workspace/customizations/config-safe.
- "Include kb-mirror/" checkbox (off by default).
- "I still have the original master key" checkbox, only visible when
  include_kb is on. Maps to the kb_key_matches body param from the
  earlier backend fix — defaults OFF so a wrong key can't decrypt-poison
  the KB.
- SSE progress bar (~10 steps) via POST /api/brain-repo/restore/start.

All text wired through useTranslation — ~40 new keys across en-US,
pt-BR and es under backups.tabs, .s3Tab, .brainTab, .brainSnapshots,
.importMenu, .brainRestore, plus viewOnGithub action.

Lazy-fetches /api/brain-repo/snapshots only when the Brain tab is
opened AND the brain repo is configured (avoids hitting the GitHub
API needlessly on every /backups visit).

* feat(backups): surface brain-repo last_error + 30s visibility-aware polling

When the watcher auto-sync fails (token expired, remote gone, network
flap) backend writes the reason to BrainRepoConfig.last_error — but the
UI used to ignore it silently. Users had no signal that sync was broken
except noticing commits not appearing on GitHub.

Now the Brain Repo destination card:
- Switches border to danger red when last_error is set.
- Replaces the "connected" pill with a "sync error" pill.
- Shows the error message verbatim in a tight banner below the repo URL.
- Adds a "Reconnect" pill button that routes back through /onboarding
  (clears the stored config + walks through PAT + repo selection again).

Polling: a 30s setInterval on /api/backups/config while the user is on
/backups AND brain is connected. Gated by document.visibilityState so it
pauses in background tabs (saves CPU + GitHub rate limit). Lets the user
see last_error clear within 30s after a successful manual Sync without
requiring a full page refresh.

i18n: +2 keys (syncError, reconnect) across en-US, pt-BR, es.

* fix(brain-repo): fail loud on missing crypto — no plaintext PAT fallback

Removes the three silent plaintext fallbacks flagged in PR review — the
exact vector of the original bug.

## connect() — fail loud on write

Previously: master_key missing OR cryptography ImportError fell through
to ``encrypted = token.encode("utf-8")`` with only a log.warning. The
PAT was then written to BrainRepoConfig.github_token_encrypted as
plaintext, the UI said "connected ✓", and sync continued working. Only
way to notice was auditing the database.

Now: both failure modes return 500 with a typed response
``{"error": "...", "code": "CRYPTO_UNAVAILABLE"}`` and escalate to
log.critical. The ``code`` field lets lib/api.ts differentiate this
error from other 500s, so the UI can render a specific actionable
message instead of generic "something went wrong".

Rationale: nobody reads WARNING-level logs in prod. The app.py
bootstrap that auto-generates BRAIN_REPO_MASTER_KEY can fail silently
in several plausible cases (read-only .env mount, cryptography module
partially installed, env var set to empty string by a shell config,
gunicorn worker race during first boot). In any of those, the user
used to store a plaintext credential without knowing. Fail-closed is
the correct security default.

## _decrypt_token() — fail safe on read

Previously: master_key missing OR ImportError fell through to
``config.github_token_encrypted.decode("utf-8")`` — returning the
stored blob verbatim as if it were the plaintext token. Worked only
because the encrypt side had the mirror bug (so blob == plaintext).

Now: every failure path logs at ERROR level and returns "". Every
current caller already treats empty token as hard failure
(``if not token: abort(500, "Could not decrypt stored token")``), so
the behaviour user-visible stays correct and surfaces the issue
instead of pretending the decrypt succeeded.

Response to PR review (davidsoncelestino):
> log.warning não é notado por ninguém em produção
Exactly — everything on this path now logs CRITICAL or ERROR and
refuses the operation instead of accepting plaintext. The stored blob
is never decoded as a literal token.

* fix(brain-repo): guard master_key in watcher startup — fail loud

start_brain_watcher used to capture master_key in the debounce closure
without checking whether it was actually available. If BRAIN_REPO_MASTER_KEY
was missing (the bootstrap failed silently), decrypt_token would crash on
every single file-change event, logging the same error every 30s forever
while the UI kept showing "auto-sync enabled ✓".

Now the watcher refuses to start when master_key is empty, with a CRITICAL
log message. The brain repo config stays connected, manual sync/tag still
work (they abort with "Could not decrypt stored token" which is visible to
the user), but the watcher doesn't spam-fail in the background.

Notable: _initialize_remote_brain_repo does NOT need the same guard — it
only needs the token argument (never encrypts/decrypts), and the upstream
connect() endpoint now fails loud BEFORE calling it when crypto is broken.

* feat(brain-repo): expose crypto_ready in /status + /backups/config + import-time CRITICAL check

Adds visibility so the user (and ops) notice when crypto is broken instead
of only finding out after attempting a sync that fails for mysterious
reasons.

- brain_repo/__init__.py:
  - New _verify_crypto_ready() import-time check. Logs CRITICAL (not
    WARNING) when the cryptography module is missing, BRAIN_REPO_MASTER_KEY
    is unset, or the key format is invalid. Runs once at process start and
    shows up red in production log aggregators — directly addresses review
    feedback: 'log.warning não é notado por ninguém em produção'.
  - New is_crypto_ready() runtime helper exposed for HTTP endpoints.

- routes/brain_repo.py:status: returns crypto_ready field in every shape
  (connected + disconnected). UI uses it to render a danger-state card.

- routes/backups.py:backup_config: returns brain_crypto_ready so the
  /backups destination card can render the same warning inline without
  making a second round-trip to /brain-repo/status.

No UI changes yet — follow-up commit wires Backups.tsx and
settings/BrainRepo.tsx to these fields.

* feat(backups,settings): surface crypto_ready — danger state + Reconnect

When the backend reports brain_crypto_ready=false / crypto_ready=false the
UI now treats it as the most urgent state:

- Backups destination card: danger border, "encryption broken" red pill
  (takes priority over "sync error"), inline explanation, Reconnect link.
- Settings /brain-repo: prominent banner above the status card explaining
  the exact failure mode and that ALL sync/tag operations will fail until
  an admin restores BRAIN_REPO_MASTER_KEY.

Why a separate state from last_error: cryptoBroken means stored tokens
can't be decrypted at all, so every sync will fail the same way. last_error
is transient (network hiccup, force-push conflict, etc). Users need to
distinguish "transient — retry" from "server-side, contact admin".

i18n: +6 keys across en-US / pt-BR / es (backups.destinations.cryptoBroken,
cryptoBrokenDesc; brainRepoSettings.cryptoBroken.title, .desc).

Closes the last piece of PR review feedback on silent plaintext fallbacks:
 - fallbacks removed (c43aa2f, 7049d1e)
 - CRITICAL logs instead of WARNING (3a12021)
 - User-visible warning so the failure is impossible to miss (this commit)

* fix(backups): /brain-repo/snapshots returns uniform shape; frontend guards against undefined ref

Clicking the Brain Repo tab turned the page black with
  Cannot read properties of undefined (reading 'replace')
because list_snapshots returned HEAD as {"sha": "..."} with no `ref`
field, while daily/weekly/milestones items had both ref+sha. The
SnapshotRow renderer called s.ref.replace(/^refs\/tags\//, '') on every
item uniformly — HEAD crashed the whole component tree.

Two fixes, defence-in-depth:

1. Backend (brain_repo/github_api.py:list_snapshots): every snapshot
   item now returns the same shape {ref, sha, label}. HEAD uses
   ref="HEAD" / label="HEAD" (not an empty dict), and the caller sees
   `head: null` when no HEAD exists (renamed from {} for clarity).
   `label` is pre-computed by stripping `refs/tags/` so the frontend
   has one string ready to render without parsing git refs.

2. Frontend (Backups.tsx): all .replace calls on snapshot.ref use
   `(snapshot.ref ?? '')` so a future regression of the backend shape
   can't re-black the page. The display string falls back to
   "(unnamed)" when label + ref are both missing.

Validated end-to-end: /api/brain-repo/snapshots now returns
  HEAD: {"ref": "HEAD", "sha": "...", "label": "HEAD"}
  Milestones[0]: {"ref": "refs/tags/milestone/...", "sha": "...", "label": "milestone/..."}

* fix(brain-repo): commit the git_ops.push tuple API + sync_worker caller update

These changes were made locally in earlier turns but never committed —
they were applied to the working tree after a stash pop during the
upstream merge and slipped past subsequent `git add <specific-files>`
commits.

The callers in routes/brain_repo.py (already committed in c43aa2f and
7049d1e) DO use the new API:

    pushed, push_err = git_ops.push(repo_dir, token, with_tags=True)

But git_ops.push on the branch still had the old signature:

    def push(repo_dir, token) -> bool

So every sync/force and tag/milestone call in the pushed code would
crash with TypeError: push() got an unexpected keyword argument
'with_tags'. Production on the branch was silently broken until now —
smoke tests passed only because I was docker-cp'ing working-tree files
into the container.

Changes now in-tree:

- push(repo_dir, token, with_tags=True) -> tuple[bool, str]. Returns
  (ok, error_message). Uses --follow-tags when with_tags=True so
  annotated tags created by create_tag actually reach GitHub. Masks
  token in stderr output before logging.
- create_tag(repo_dir, tag, msg, force=False). When force=True, uses
  git tag -a -f so re-tagging an existing milestone moves the pointer
  instead of erroring.
- sync_worker._execute_job uses the new tuple return + threads the
  real error into the pending-job JSON.

All callers verified: sync_force, tag_milestone, _initialize_remote_brain_repo
in routes/brain_repo.py already destructure the tuple correctly.

* feat(onboarding): shared OnboardingHeader with EvoNexus logo on inner steps

Adds the EvoNexus logo above the step indicator on every inner wizard
screen so the brand presence is consistent from Welcome through
Step 3 (not just on the first screen).

New component OnboardingHeader renders:
  [logo EVO_NEXUS.webp h-7]
  [Step N of 3 • ● ● ○]

Used on StepProvider (step1of3), StepBrainRepo (step2of3),
StepBrainConnect (step2aOf3), StepBrainChoose (step2bOf3), StepConfirm
(step3of3). Each step file lost its duplicated step-indicator block —
fewer lines overall.

Welcome.tsx keeps its bespoke header with the larger logo and welcome
title — it's the landing screen, different layout by design.

These file changes were applied to the working tree during an earlier
onboarding polish but never committed — they slipped through the
subsequent `git add <specific-files>` batches.

* fix(brain-repo): watcher persists push result to last_error / last_sync

The auto-sync closure (`_sync_fn` in start_brain_watcher) was calling
git_ops.push() but discarding the (ok, error_message) tuple. Symptoms:

1. Push failures never reached BrainRepoConfig.last_error, so the UI
   card kept showing "connected ✓" while auto-sync quietly failed every
   30 seconds in the background. The user only noticed when commits
   stopped appearing on GitHub.

2. Push successes didn't update last_sync, so the "Last sync: 2m ago"
   stayed frozen at the manual-sync timestamp even while the watcher
   was happily pushing every file change.

Fix: capture the tuple, log ERROR on failure, and persist the result
inside `flask_app.app_context()` (same Flask instance the start function
received as a parameter, so the existing circular-import fix from
f1ed7d8 keeps working). The 30s UI polling from 0e8235a now actually
reflects watcher state.

Found during post-merge audit of all `git_ops.push` call sites — every
other caller was destructuring the tuple already; this one slipped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants