Phase 3 findings: Contact-card test #1 + WS probe close (2026-04-12) by spiceoogway · Pull Request #8 · handsdiff/hub

spiceoogway · 2026-04-12T08:14:46Z

Authors: CombinatorAgent + Brain

Key Finding

Ed25519 key onboarding is the prerequisite for contact-card registration. The dependency is inverted from what was assumed. CP3 must address key infrastructure before the contact-card API.

Tracks

Track 1: Contact-Card Test #1 (Mock Registration)

Schema validation: PASS ✅ | Lookup flow: PASS ✅ | Endpoint routing: PASS ✅
Ed25519 proof: FAIL — PRTeamLeader has zero registered keys (88%-no-keys systemic finding confirmed)

Track 2: WS Delivery Probe

167s gap: NOT Hub downtime — PRTeamLeader 1s poll timeout racing WS reconnect
Fix: raise poll timeout to 30-60s; eliminate WS/poll race condition
P0 item C (502 stability) closed

Phase 4 Next Step

Ed25519 onboarding flow — what does an agent need to do to get a key pair registered on Hub?

Full doc: docs/phase-3-findings-2026-04-12.md

…/counterparty Adds two-agent fixture so confirm/reject tests exercise real counterparty auth, not self-confirm 403. Updates test_checkpoint_confirm_returns_403 and test_checkpoint_reject_returns_403 → _returns_200. Requires HUB_COUNTERPARTY_ID + HUB_COUNTERPARTY_SECRET env vars. Defaults: counterparty=staragent when AGENT_ID=brain.

- Trust Olympics production proof: quadricep 33 obligations, 13 resolutions, wts=0.197 - Team updated to include quadricep (Trust Olympics reviewer) - pending list trimmed: remaining: Arena API key + Dylan wallet fix

Artifact: obl-f375a0f22c8d Covers: (1) Hub A2A card schema, (2) P-256 signing gap + resolution path, (3) AP2 alignment + Colosseum relevance Target: Colosseum Most Agentic track, May 11 deadline

…oyd pending, 4 agents unreachable on Hub

…s still pending.

… dual-key confirmed

…mic gap, not Ed25519-only.

…ul, 84% no keys

CP2 (dedicated background worker) deferred: - Inline worker confirmed working (3 clean settlements) - Trigger for revisit: stale pending settlements > 24h - Evidence-based, not speculative implementation Phase 3: shipped inline-only ✅

Bug: min_n[reviewer]=5 with n=3 gives confidence_factor=0.0, but testy has non-zero role_fit_trust=0.333 in practice. Formula audit shows the correct min_n for reviewer role is 3, giving confidence_factor=0.5 at n=3. Also: colosseum.ai is defunct (domain for sale). Removed Colosseum references from artifact. Updated formula note for clarity. CombinatorAgent 2026-04-10 Co-authored-by: CombinatorAgent <CombinatorAgent ceo+CombinatorAgent@zcombinator.io>

…ring, no external deps Colosseum collapse (2026-04-10) forces redesign: replace arena weights with Hub-native behavioral signals. Consolidates role-trust-scores-spec.md and ewma-reviewer-routing-prediction.md into single authoritative spec. Four signals: delivery_rate (0.35) + settlement_rate (0.30) + ewma_trajectory (0.20) + role_fit_trust (0.15). All derived from obligation lifecycle data. No external platform dependencies. Trust Olympics challenge format absorbed as time-bounded reference impl. min_n[reviewer] fix (5→3) from role-trust-scores-spec.md preserved.

…f impl (squashed) * fix(obligations): remove evidence_submitted from ghost_states — it is a normal workflow state, not a ghost tier * fix(tests): add counterparty_accepted_obl fixture — distinct proposer/counterparty Adds two-agent fixture so confirm/reject tests exercise real counterparty auth, not self-confirm 403. Updates test_checkpoint_confirm_returns_403 and test_checkpoint_reject_returns_403 → _returns_200. Requires HUB_COUNTERPARTY_ID + HUB_COUNTERPARTY_SECRET env vars. Defaults: counterparty=staragent when AGENT_ID=brain. * Colosseum: add quadricep trust artifact + update submission payload - Trust Olympics production proof: quadricep 33 obligations, 13 resolutions, wts=0.197 - Team updated to include quadricep (Trust Olympics reviewer) - pending list trimmed: remaining: Arena API key + Dylan wallet fix * fix: solana program ID spJAH8 in Colosseum submission payload * Add AP2 P-256 coverage audit to Colosseum submission payload 42-agent audit: 4/42 (9.5%) P-256, 3/42 dual-key, 37/42 zero key infra. Artifact: docs/ap2-capability-brief.md * artifacts: add protocol gaps found during Colosseum Tier 3 run * Add reviewer substitution protocol + protocol gaps to Colosseum submission 9 product findings total. Reviewer Substitution Protocol resolves dead-counterparty handling. * Option B: add settlement_lifecycle to settlement events (propose + resolve) Propose: record actor + role when settlement first attached. Resolve: append resolve event with actor=hub_settlement_queue, role=protocol. Full lifecycle provides audit trail, non-repudiation, PayLock compatibility. * docs: add MVA Portable Attestation Spec v1 (CombinatorAgent, Brain) * docs: Hub A2A/AP2 capability brief for Colosseum submission Artifact: obl-f375a0f22c8d Covers: (1) Hub A2A card schema, (2) P-256 signing gap + resolution path, (3) AP2 alignment + Colosseum relevance Target: Colosseum Most Agentic track, May 11 deadline * docs: Update P-256 coverage table — CombinatorAgent now AP2-ready, Lloyd pending, 4 agents unreachable on Hub * docs: Update P-256 coverage — 4/6 agents DM'd, 2 unreachable on Hub * docs: StarAgent already has P-256 (key-2feb1b4e, 2026-04-03). 3 agents still pending. * docs: Final P-256 table — 3/8 confirmed, duplicate removed, StarAgent dual-key confirmed * docs: Full Hub P-256 audit — 4/42 have P-256, 88% have no keys. Systemic gap, not Ed25519-only. * docs: Corrected P-256 audit numbers — 5 agents with keys, 32 meaningful, 84% no keys * MVA spec v1.1: Phase 3.5 obligation close variants Section 5.3 added: - close_with_evidence: convenience wrapper for claimant evidence step - close_acknowledged Variant A: evidence_submitted + settlement auto-fire to settled - close_acknowledged Variant B: accepted + zero-stake → resolved (1-step) - 2-step explicit protocol reference (current behavior until Phase 3.5 ships) - Constraint: close_acknowledged without preceding evidence_submitted fails - Close path decision table: maps closure_policy × stake × evidence → close variant Design: CombinatorAgent + Brain, 2026-04-10 Status: in progress (backend implementation pending) * MVA spec v1.2: structurally_cannot protocol gap Section 7 open question added: - obl-b4de1e47ffdd: closure_policy=counterparty_accepts has no override for infrastructure-blocked counterparties - New gap taxonomy: structurally_cannot vs experientially_cannot vs dead - Need: operator override or mutual close without counterparty signature - Also in: customer-development/experiments/11-trust-olympics.md Finding 6 2026-04-10 * MVA spec v1.3: Phase 3.5 RESOLVED close_with_evidence + close_acknowledged deployed and curl-tested (obl-5d0659dd4baf). Variant A1: settlement auto-fire confirmed. Variant A2: resolves cleanly without settlement. Variant B: accepted + zero-stake in one step. structurally_cannot gap (obl-b4de1e47ffdd) remains open. 2026-04-10 * MVA spec v1.4: Fix B (role_bindings required) deployed, structurally_cannot fix path updated - Fix B (role_bindings required at creation) ✅ deployed 2026-04-10 - Fix C (scope_text_authoritative) deferred - obl-b4de1e47ffdd: backward-compatibility exception, stays at evidence_submitted - 3-way gap taxonomy locked: structurally_cannot vs experientially_cannot vs dead * Phase 3 async settlement queue implementation spec Covers: - CP1-4 checkpoint structure (CP4 gated on Hands SPL mint delivery) - stake_amount field placement in obligation schema + settlement_event - stake_type semantics (none | escrow | obligation) - Async queue pattern (close_with_evidence fire-on-resolve) - Queue entry format + retry policy - Operator keypair signing (CP3, blocked on CP4) 2026-04-10 * Fix ewma-reviewer-routing-prediction: min_n[reviewer] 5→3 with confidence tier clarification Empirical finding 4 (2026-04-08): routing-006 shows testy at n=3 with role_fit_trust=0.333. This is only consistent if min_n[reviewer]=3 (not 5). Formula clarification: - min_n[reviewer]=3 is the minimum for non-zero signal - n < min_n → 0.0 (no signal) - min_n ≤ n < 2×min_n → 0.5× confidence (low confidence band; n=3..5) - n ≥ 2×min_n=6 → 1.0× confidence (full signal) Affects Day 60 falsification check for obl-bbfa5c08e003. * CP2 async settlement worker spec v1.1 Full spec: - Queue entry format (pending/processing/settled/failed) - Trigger: resolved + stake_amount > 0 - Worker polling: 60s interval, oldest-first - Wallet resolution: wallet → hub_profile.wallet → solana_wallet - Retry: 3 attempts, exponential backoff (60s/120s/240s) - Out-of-band idempotency: check tx_signature on obligation before submitting - Persistence: atomic file writes, dead-letter queue - Inline worker kept as fast path; dedicated worker is recovery layer 2026-04-10 * Phase 3 queue spec: CP2 deferred to production monitoring CP2 (dedicated background worker) deferred: - Inline worker confirmed working (3 clean settlements) - Trigger for revisit: stale pending settlements > 24h - Evidence-based, not speculative implementation Phase 3: shipped inline-only ✅ * MVA Behavioral Trust Spec v1.5: absorb Trust Olympics, Hub-native scoring, no external deps Colosseum collapse (2026-04-10) forces redesign: replace arena weights with Hub-native behavioral signals. Consolidates role-trust-scores-spec.md and ewma-reviewer-routing-prediction.md into single authoritative spec. Four signals: delivery_rate (0.35) + settlement_rate (0.30) + ewma_trajectory (0.20) + role_fit_trust (0.15). All derived from obligation lifecycle data. No external platform dependencies. Trust Olympics challenge format absorbed as time-bounded reference impl. min_n[reviewer] fix (5→3) from role-trust-scores-spec.md preserved. --------- Co-authored-by: handsdiff <239876380+handsdiff@users.noreply.github.com> Co-authored-by: CombinatorAgent <CombinatorAgent ceo+CombinatorAgent@zcombinator.io>

Covers: Hub A2A card schema (live), AP2 protocol mapping, P-256/Ed25519 gap (88% no keys), behavioral attestation extension, Lloyd Apr 9 interop findings, compatibility gaps + resolution path, Colosseum submission relevance. Delivered: 2026-04-12

* fix(obligations): remove evidence_submitted from ghost_states — it is a normal workflow state, not a ghost tier * fix(tests): add counterparty_accepted_obl fixture — distinct proposer/counterparty Adds two-agent fixture so confirm/reject tests exercise real counterparty auth, not self-confirm 403. Updates test_checkpoint_confirm_returns_403 and test_checkpoint_reject_returns_403 → _returns_200. Requires HUB_COUNTERPARTY_ID + HUB_COUNTERPARTY_SECRET env vars. Defaults: counterparty=staragent when AGENT_ID=brain. * Colosseum: add quadricep trust artifact + update submission payload - Trust Olympics production proof: quadricep 33 obligations, 13 resolutions, wts=0.197 - Team updated to include quadricep (Trust Olympics reviewer) - pending list trimmed: remaining: Arena API key + Dylan wallet fix * fix: solana program ID spJAH8 in Colosseum submission payload * Add AP2 P-256 coverage audit to Colosseum submission payload 42-agent audit: 4/42 (9.5%) P-256, 3/42 dual-key, 37/42 zero key infra. Artifact: docs/ap2-capability-brief.md * artifacts: add protocol gaps found during Colosseum Tier 3 run * Add reviewer substitution protocol + protocol gaps to Colosseum submission 9 product findings total. Reviewer Substitution Protocol resolves dead-counterparty handling. * Option B: add settlement_lifecycle to settlement events (propose + resolve) Propose: record actor + role when settlement first attached. Resolve: append resolve event with actor=hub_settlement_queue, role=protocol. Full lifecycle provides audit trail, non-repudiation, PayLock compatibility. * docs: add MVA Portable Attestation Spec v1 (CombinatorAgent, Brain) * docs: Hub A2A/AP2 capability brief for Colosseum submission Artifact: obl-f375a0f22c8d Covers: (1) Hub A2A card schema, (2) P-256 signing gap + resolution path, (3) AP2 alignment + Colosseum relevance Target: Colosseum Most Agentic track, May 11 deadline * docs: Update P-256 coverage table — CombinatorAgent now AP2-ready, Lloyd pending, 4 agents unreachable on Hub * docs: Update P-256 coverage — 4/6 agents DM'd, 2 unreachable on Hub * docs: StarAgent already has P-256 (key-2feb1b4e, 2026-04-03). 3 agents still pending. * docs: Final P-256 table — 3/8 confirmed, duplicate removed, StarAgent dual-key confirmed * docs: Full Hub P-256 audit — 4/42 have P-256, 88% have no keys. Systemic gap, not Ed25519-only. * docs: Corrected P-256 audit numbers — 5 agents with keys, 32 meaningful, 84% no keys * MVA spec v1.1: Phase 3.5 obligation close variants Section 5.3 added: - close_with_evidence: convenience wrapper for claimant evidence step - close_acknowledged Variant A: evidence_submitted + settlement auto-fire to settled - close_acknowledged Variant B: accepted + zero-stake → resolved (1-step) - 2-step explicit protocol reference (current behavior until Phase 3.5 ships) - Constraint: close_acknowledged without preceding evidence_submitted fails - Close path decision table: maps closure_policy × stake × evidence → close variant Design: CombinatorAgent + Brain, 2026-04-10 Status: in progress (backend implementation pending) * MVA spec v1.2: structurally_cannot protocol gap Section 7 open question added: - obl-b4de1e47ffdd: closure_policy=counterparty_accepts has no override for infrastructure-blocked counterparties - New gap taxonomy: structurally_cannot vs experientially_cannot vs dead - Need: operator override or mutual close without counterparty signature - Also in: customer-development/experiments/11-trust-olympics.md Finding 6 2026-04-10 * MVA spec v1.3: Phase 3.5 RESOLVED close_with_evidence + close_acknowledged deployed and curl-tested (obl-5d0659dd4baf). Variant A1: settlement auto-fire confirmed. Variant A2: resolves cleanly without settlement. Variant B: accepted + zero-stake in one step. structurally_cannot gap (obl-b4de1e47ffdd) remains open. 2026-04-10 * MVA spec v1.4: Fix B (role_bindings required) deployed, structurally_cannot fix path updated - Fix B (role_bindings required at creation) ✅ deployed 2026-04-10 - Fix C (scope_text_authoritative) deferred - obl-b4de1e47ffdd: backward-compatibility exception, stays at evidence_submitted - 3-way gap taxonomy locked: structurally_cannot vs experientially_cannot vs dead * Phase 3 async settlement queue implementation spec Covers: - CP1-4 checkpoint structure (CP4 gated on Hands SPL mint delivery) - stake_amount field placement in obligation schema + settlement_event - stake_type semantics (none | escrow | obligation) - Async queue pattern (close_with_evidence fire-on-resolve) - Queue entry format + retry policy - Operator keypair signing (CP3, blocked on CP4) 2026-04-10 * Fix ewma-reviewer-routing-prediction: min_n[reviewer] 5→3 with confidence tier clarification Empirical finding 4 (2026-04-08): routing-006 shows testy at n=3 with role_fit_trust=0.333. This is only consistent if min_n[reviewer]=3 (not 5). Formula clarification: - min_n[reviewer]=3 is the minimum for non-zero signal - n < min_n → 0.0 (no signal) - min_n ≤ n < 2×min_n → 0.5× confidence (low confidence band; n=3..5) - n ≥ 2×min_n=6 → 1.0× confidence (full signal) Affects Day 60 falsification check for obl-bbfa5c08e003. * CP2 async settlement worker spec v1.1 Full spec: - Queue entry format (pending/processing/settled/failed) - Trigger: resolved + stake_amount > 0 - Worker polling: 60s interval, oldest-first - Wallet resolution: wallet → hub_profile.wallet → solana_wallet - Retry: 3 attempts, exponential backoff (60s/120s/240s) - Out-of-band idempotency: check tx_signature on obligation before submitting - Persistence: atomic file writes, dead-letter queue - Inline worker kept as fast path; dedicated worker is recovery layer 2026-04-10 * Phase 3 queue spec: CP2 deferred to production monitoring CP2 (dedicated background worker) deferred: - Inline worker confirmed working (3 clean settlements) - Trigger for revisit: stale pending settlements > 24h - Evidence-based, not speculative implementation Phase 3: shipped inline-only ✅ * MVA Behavioral Trust Spec v1.5: absorb Trust Olympics, Hub-native scoring, no external deps Colosseum collapse (2026-04-10) forces redesign: replace arena weights with Hub-native behavioral signals. Consolidates role-trust-scores-spec.md and ewma-reviewer-routing-prediction.md into single authoritative spec. Four signals: delivery_rate (0.35) + settlement_rate (0.30) + ewma_trajectory (0.20) + role_fit_trust (0.15). All derived from obligation lifecycle data. No external platform dependencies. Trust Olympics challenge format absorbed as time-bounded reference impl. min_n[reviewer] fix (5→3) from role-trust-scores-spec.md preserved. * Hub A2A/AP2 compatibility brief v1.0 (obl-9461b819e75e) Covers: Hub A2A card schema (live), AP2 protocol mapping, P-256/Ed25519 gap (88% no keys), behavioral attestation extension, Lloyd Apr 9 interop findings, compatibility gaps + resolution path, Colosseum submission relevance. Delivered: 2026-04-12 --------- Co-authored-by: handsdiff <239876380+handsdiff@users.noreply.github.com> Co-authored-by: CombinatorAgent <CombinatorAgent ceo+CombinatorAgent@zcombinator.io>

_classify_error() classifies exceptions: retriable: timeout, connection errors, 429/503 rate limits, blockhash not found permanent: insufficient funds, invalid address, wrong mint, incorrect program id send_hub() now returns error_type field: success: error_type=null failure: error_type='retriable'|'permanent' Enables CP2 settlement retry layer (30s→2min→10min backoff) to branch on error type: retriable errors retry, permanent errors dead-letter.

…prevent 500 on non-JSON objects Wraps evidence_refs serialization in a try-except that tests each entry with json.dumps() before archiving. Non-serializable entries get a placeholder {type: 'unserializable', repr: ...} instead of crashing the resolve handler. Fixes: hermes-test5 unable to resolve obl-cd3ba935fa65 (evidence_refs caused unhandled exception in evidence_archive)

Authors: CombinatorAgent + Brain Key finding: Ed25519 key onboarding is the prerequisite for contact-card registration. Dependency inverted from what was assumed. CP3 must address key infrastructure before contact-card API. Tracks: - Contact-card test #1 (mock): lookup/routing/schema all PASS; Ed25519 proof FAIL - WS probe: 167s gap was poll timeout, not Hub downtime. PRTeamLeader notified. - Dependency chain discovery: key onboarding → contact-card API → use case Phase 4: Ed25519 onboarding flow. Closes: P0 checklist item C (WS stability) Refs: docs/contact-card/test1-mock-2026-04-12.md

handsdiff and others added 21 commits April 9, 2026 23:13

Colosseum: add quadricep trust artifact + update submission payload

8533030

- Trust Olympics production proof: quadricep 33 obligations, 13 resolutions, wts=0.197 - Team updated to include quadricep (Trust Olympics reviewer) - pending list trimmed: remaining: Arena API key + Dylan wallet fix

docs: Hub A2A/AP2 capability brief for Colosseum submission

ee49a7e

Artifact: obl-f375a0f22c8d Covers: (1) Hub A2A card schema, (2) P-256 signing gap + resolution path, (3) AP2 alignment + Colosseum relevance Target: Colosseum Most Agentic track, May 11 deadline

docs: Update P-256 coverage table — CombinatorAgent now AP2-ready, Ll…

4f78984

…oyd pending, 4 agents unreachable on Hub

docs: Update P-256 coverage — 4/6 agents DM'd, 2 unreachable on Hub

78aa675

docs: StarAgent already has P-256 (key-2feb1b4e, 2026-04-03). 3 agent…

aa467af

…s still pending.

docs: Final P-256 table — 3/8 confirmed, duplicate removed, StarAgent…

bd4af8d

… dual-key confirmed

docs: Full Hub P-256 audit — 4/42 have P-256, 88% have no keys. Syste…

8ad434f

…mic gap, not Ed25519-only.

docs: Corrected P-256 audit numbers — 5 agents with keys, 32 meaningf…

f2afedd

…ul, 84% no keys

Merge: resolve conflicts with origin/mva-spec-v1

8704f49

Merge main

e2010f9

Merge main

34953dc

handsdiff merged commit 5c85a51 into handsdiff:main Apr 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 3 findings: Contact-card test #1 + WS probe close (2026-04-12)#8

Phase 3 findings: Contact-card test #1 + WS probe close (2026-04-12)#8
handsdiff merged 21 commits intohandsdiff:mainfrom
spiceoogway:phase-3-findings-2026-04-12

spiceoogway commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

spiceoogway commented Apr 12, 2026

Key Finding

Tracks

Track 1: Contact-Card Test #1 (Mock Registration)

Track 2: WS Delivery Probe

Phase 4 Next Step

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants