Skip to content

fix(security): block serverless re-register bypass of admin revocation (#374)#375

Merged
AbirAbbas merged 6 commits intomainfrom
fix/revocation-bypass-serverless-register
Apr 9, 2026
Merged

fix(security): block serverless re-register bypass of admin revocation (#374)#375
AbirAbbas merged 6 commits intomainfrom
fix/revocation-bypass-serverless-register

Conversation

@santoshkumarradha
Copy link
Copy Markdown
Member

Summary

Closes #374 — follow-up to the TC-034 deep review on #373.

RegisterServerlessAgentHandler (control-plane/internal/handlers/nodes.go) unconditionally set LifecycleStatus = AgentStatusReady on every call and wrote straight to storage via RegisterAgent, bypassing the admin-revocation preservation logic used by the standard registration path at nodes.go:540-556 and the 503 block at nodes.go:1051-1056.

An admin-revoked serverless agent (lifecycle pending_approval, ApprovedTags cleared) could clear its own revocation by simply calling POST /api/v1/nodes/register-serverless again. This defeats the revocation mechanism that #373 specifically landed to protect: once re-registration clears revocation, all downstream call paths (execute, reasoner, skill, DID) permit calls again.

See the full threat analysis in #374.

Changes

  • control-plane/internal/handlers/nodes.go

    • After GetAgent returns an existingNode, detect admin revocation (existingNode.LifecycleStatus == pending_approval && len(existingNode.ApprovedTags) == 0) and reject re-registration with 503 Service Unavailable and {"error": "agent_pending_approval", "message": "..."}. Matches the stable contract used by reasoners.go, skills.go, the permission middleware, and the fix: block revoked-tag agent calls + friendly empty logs state (TC-034, TC-035) #373 fix.
    • For non-revoked re-registrations, preserve existingNode.LifecycleStatus and existingNode.ApprovedTags onto newNode before calling RegisterAgent, so the UPSERT does not clobber approval state.
    • First registrations (no existingNode) are unchanged — new serverless agents still land at LifecycleStatus = Ready.
  • control-plane/internal/handlers/nodes_serverless_revocation_test.go (new)

    • TestRegisterServerlessAgentHandler_BlocksReregisterOfRevokedAgent — seeds a revoked serverless agent, calls the handler, asserts 503 + stable {error, message} response shape and that no overwrite happened.
    • TestRegisterServerlessAgentHandler_PreservesApprovedTagsOnReregister — seeds an approved serverless agent, calls the handler, asserts the persisted record still has ApprovedTags and LifecycleStatus = Ready.
    • TestRegisterServerlessAgentHandler_FirstRegistrationUnchanged — baseline: empty storage still yields 201 + Ready.

Why a separate PR

Kept isolated from #373 to keep that PR focused on TC-034 / TC-035 and to make this security-critical change individually reviewable, auditable, and revertable. This PR depends on nothing in #373 and targets main directly.

Test plan

  • go build ./... passes
  • go test ./internal/handlers/ -run "TestRegisterServerlessAgentHandler" -v passes (3 new tests + 7 pre-existing)
  • go test ./internal/handlers/ -count=1 passes (full package regression)
  • Manual: register serverless agent → admin-revoke → re-register → expect 503, lifecycle still pending_approval
  • Manual: register serverless agent with approved tags → re-register → expect 201, ApprovedTags preserved

cc @AbirAbbas — this pairs with the review on #373. Merging order is flexible; neither PR blocks the other.

#374)

RegisterServerlessAgentHandler unconditionally set LifecycleStatus=ready
on re-registration and called RegisterAgent directly, bypassing the
admin-revocation preservation logic used by the standard registration
path (nodes.go:540-556) and the 503 block at nodes.go:1051-1056.

An admin-revoked serverless agent could clear its own pending_approval
state by simply calling POST /api/v1/nodes/register-serverless again —
defeating the revocation mechanism that #373 landed.

- Reject re-registration of admin-revoked agents with 503
  {"error": "agent_pending_approval", "message": "..."}, matching the
  stable contract used elsewhere in the handler.
- Preserve existingNode.LifecycleStatus and ApprovedTags on non-revoked
  re-registrations so the UPSERT does not clobber approval state.
- Leave first-registration behavior unchanged.

Adds regression tests covering the revoked-reject path and the
approved-tags-preservation path.

Closes #374.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

📊 Coverage gate

Thresholds from .coverage-gate.toml: per-surface ≥ 86%, aggregate ≥ 88%, max per-surface regression ≤ 1.0 pp, max aggregate regression ≤ 0.50 pp.

Surface Current Baseline Δ
control-plane 87.30% 87.30% → +0.00 pp 🟡
sdk-go 90.70% 90.70% → +0.00 pp 🟢
sdk-python 93.63% 93.63% ↑ +0.00 pp 🟢
sdk-typescript 92.56% 92.56% → +0.00 pp 🟢
web-ui 90.02% 90.01% ↑ +0.01 pp 🟢
aggregate 89.02% 89.01% ↑ +0.01 pp 🟡

✅ Gate passed

No surface regressed past the allowed threshold and the aggregate stayed above the floor.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

📐 Patch coverage gate

Threshold: 80% on lines this PR touches vs origin/main (from .coverage-gate.toml:thresholds.min_patch).

Surface Touched lines Patch coverage Status
control-plane 44 100.00%
sdk-go 0 ➖ no changes
sdk-python 0 ➖ no changes
sdk-typescript 0 ➖ no changes
web-ui 0 ➖ no changes

✅ Patch gate passed

Every surface whose lines were touched by this PR has patch coverage at or above the threshold.

…ose patch coverage gap

Two fixes to make the re-registration approval-preservation path in
RegisterServerlessAgentHandler work correctly and pass the patch
coverage gate:

1. Discovery parser was dropping Tags from skills.
   The anonymous Skills struct in the serverless discovery payload
   decoder had no `Tags` field, and the SkillDefinition conversion
   didn't copy tags either. As a result newNode.Skills[i].Tags was
   always empty in production, which meant the re-register
   preservation loop at the end of the same handler was silently a
   no-op for skills: it would iterate zero tags and clear
   ApprovedTags rather than filter it against existingNode's
   approved set. Reasoners were already handled correctly; this
   brings skills to parity.

2. Revert a cosmetic indent change unrelated to the security fix.
   The auto-discovery block in RegisterNodeHandler was dedented by
   one tab in the original PR, which is unrelated to blocking
   revoked-tag re-registration. Reverting that block keeps this PR
   focused on the security fix and removes it from diff-cover's
   touched-lines set.

3. Extend TestRegisterServerlessAgentHandler_PreservesApprovedTagsOnReregister
   to send a skill with tags in the discover response and assert
   per-reasoner and per-skill ApprovedTags filtering. This was the
   only previously-uncovered block in the security fix (nodes.go
   ~1473-1480, the Skills preservation loop) and now exercises the
   real filtering path end-to-end.

Verified locally:
  (cd control-plane && go test -run 'TestRegisterServerlessAgentHandler_' \
     ./internal/handlers/...)  # all 3 tests pass
…ration

The re-registration preservation block (introduced earlier in this PR
in both RegisterNodeHandler and RegisterServerlessAgentHandler) was
preserving existingNode.LifecycleStatus alongside ApprovedTags. This
broke docs_quick_start_execution_webhook_contract functional test
because:

1. test_docs_quick_start_flow registers "my-agent", runs an execution,
   then the agent server exits. The DB row lands in stopping/offline.
2. test_docs_quick_start_execution_webhook_contract re-registers the
   same node_id. The preservation code pulled the stale terminal
   status (stopping) into the fresh registration.
3. The UPSERT persisted "my-agent" in a mid-shutdown state even
   though the agent process was running fine.
4. The reasoner executed successfully (reasoner logs confirm, and
   the synchronous /execute HTTP response returned status=succeeded),
   but downstream status inference in the webhook dispatcher read
   the stale agent state and the execution record's Status ended up
   as "failed". determineWebhookEvent() thus returned
   "execution.failed" instead of "execution.completed", breaking the
   test assertion at test_quick_start.py:185.

Fix: remove the `newNode.LifecycleStatus = existingNode.LifecycleStatus`
line from both handlers.

- RegisterNodeHandler: the fallback below already resets empty/offline
  status to AgentStatusStarting, which is the correct initial state
  for a re-registering agent. The lifecycle state machine takes it
  from there.
- RegisterServerlessAgentHandler: the freshly constructed newNode
  already has LifecycleStatus = AgentStatusReady set during discovery,
  which is the correct state for a serverless agent that just
  completed its discover handshake.

The ApprovedTags preservation (which is what the code comment
actually claims this block is for) remains intact. Per-reasoner and
per-skill approved-tag filtering also remain intact. The admin-
revocation rejection path (the security fix) is unaffected.

All 3 serverless revocation tests still pass, and the broader
handlers package tests still pass:
  (cd control-plane && go test ./internal/handlers/)  # 4.4s, ok
@santoshkumarradha santoshkumarradha added this pull request to the merge queue Apr 9, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Apr 9, 2026
@AbirAbbas AbirAbbas added this pull request to the merge queue Apr 9, 2026
@AbirAbbas AbirAbbas enabled auto-merge April 9, 2026 23:17
@AbirAbbas AbirAbbas added this pull request to the merge queue Apr 9, 2026
Merged via the queue into main with commit a977370 Apr 9, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

security: RegisterServerlessAgentHandler bypasses admin-revocation preservation (re-registration clears pending_approval)

2 participants