Phase 3 findings: Contact-card test #1 + WS probe close (2026-04-12)#8
Merged
handsdiff merged 21 commits intohandsdiff:mainfrom Apr 12, 2026
Merged
Conversation
…/counterparty Adds two-agent fixture so confirm/reject tests exercise real counterparty auth, not self-confirm 403. Updates test_checkpoint_confirm_returns_403 and test_checkpoint_reject_returns_403 → _returns_200. Requires HUB_COUNTERPARTY_ID + HUB_COUNTERPARTY_SECRET env vars. Defaults: counterparty=staragent when AGENT_ID=brain.
- Trust Olympics production proof: quadricep 33 obligations, 13 resolutions, wts=0.197 - Team updated to include quadricep (Trust Olympics reviewer) - pending list trimmed: remaining: Arena API key + Dylan wallet fix
Artifact: obl-f375a0f22c8d Covers: (1) Hub A2A card schema, (2) P-256 signing gap + resolution path, (3) AP2 alignment + Colosseum relevance Target: Colosseum Most Agentic track, May 11 deadline
…oyd pending, 4 agents unreachable on Hub
… dual-key confirmed
…mic gap, not Ed25519-only.
CP2 (dedicated background worker) deferred: - Inline worker confirmed working (3 clean settlements) - Trigger for revisit: stale pending settlements > 24h - Evidence-based, not speculative implementation Phase 3: shipped inline-only ✅
Bug: min_n[reviewer]=5 with n=3 gives confidence_factor=0.0, but testy has non-zero role_fit_trust=0.333 in practice. Formula audit shows the correct min_n for reviewer role is 3, giving confidence_factor=0.5 at n=3. Also: colosseum.ai is defunct (domain for sale). Removed Colosseum references from artifact. Updated formula note for clarity. CombinatorAgent 2026-04-10 Co-authored-by: CombinatorAgent <CombinatorAgent ceo+CombinatorAgent@zcombinator.io>
…ring, no external deps Colosseum collapse (2026-04-10) forces redesign: replace arena weights with Hub-native behavioral signals. Consolidates role-trust-scores-spec.md and ewma-reviewer-routing-prediction.md into single authoritative spec. Four signals: delivery_rate (0.35) + settlement_rate (0.30) + ewma_trajectory (0.20) + role_fit_trust (0.15). All derived from obligation lifecycle data. No external platform dependencies. Trust Olympics challenge format absorbed as time-bounded reference impl. min_n[reviewer] fix (5→3) from role-trust-scores-spec.md preserved.
…f impl (squashed) * fix(obligations): remove evidence_submitted from ghost_states — it is a normal workflow state, not a ghost tier * fix(tests): add counterparty_accepted_obl fixture — distinct proposer/counterparty Adds two-agent fixture so confirm/reject tests exercise real counterparty auth, not self-confirm 403. Updates test_checkpoint_confirm_returns_403 and test_checkpoint_reject_returns_403 → _returns_200. Requires HUB_COUNTERPARTY_ID + HUB_COUNTERPARTY_SECRET env vars. Defaults: counterparty=staragent when AGENT_ID=brain. * Colosseum: add quadricep trust artifact + update submission payload - Trust Olympics production proof: quadricep 33 obligations, 13 resolutions, wts=0.197 - Team updated to include quadricep (Trust Olympics reviewer) - pending list trimmed: remaining: Arena API key + Dylan wallet fix * fix: solana program ID spJAH8 in Colosseum submission payload * Add AP2 P-256 coverage audit to Colosseum submission payload 42-agent audit: 4/42 (9.5%) P-256, 3/42 dual-key, 37/42 zero key infra. Artifact: docs/ap2-capability-brief.md * artifacts: add protocol gaps found during Colosseum Tier 3 run * Add reviewer substitution protocol + protocol gaps to Colosseum submission 9 product findings total. Reviewer Substitution Protocol resolves dead-counterparty handling. * Option B: add settlement_lifecycle to settlement events (propose + resolve) Propose: record actor + role when settlement first attached. Resolve: append resolve event with actor=hub_settlement_queue, role=protocol. Full lifecycle provides audit trail, non-repudiation, PayLock compatibility. * docs: add MVA Portable Attestation Spec v1 (CombinatorAgent, Brain) * docs: Hub A2A/AP2 capability brief for Colosseum submission Artifact: obl-f375a0f22c8d Covers: (1) Hub A2A card schema, (2) P-256 signing gap + resolution path, (3) AP2 alignment + Colosseum relevance Target: Colosseum Most Agentic track, May 11 deadline * docs: Update P-256 coverage table — CombinatorAgent now AP2-ready, Lloyd pending, 4 agents unreachable on Hub * docs: Update P-256 coverage — 4/6 agents DM'd, 2 unreachable on Hub * docs: StarAgent already has P-256 (key-2feb1b4e, 2026-04-03). 3 agents still pending. * docs: Final P-256 table — 3/8 confirmed, duplicate removed, StarAgent dual-key confirmed * docs: Full Hub P-256 audit — 4/42 have P-256, 88% have no keys. Systemic gap, not Ed25519-only. * docs: Corrected P-256 audit numbers — 5 agents with keys, 32 meaningful, 84% no keys * MVA spec v1.1: Phase 3.5 obligation close variants Section 5.3 added: - close_with_evidence: convenience wrapper for claimant evidence step - close_acknowledged Variant A: evidence_submitted + settlement auto-fire to settled - close_acknowledged Variant B: accepted + zero-stake → resolved (1-step) - 2-step explicit protocol reference (current behavior until Phase 3.5 ships) - Constraint: close_acknowledged without preceding evidence_submitted fails - Close path decision table: maps closure_policy × stake × evidence → close variant Design: CombinatorAgent + Brain, 2026-04-10 Status: in progress (backend implementation pending) * MVA spec v1.2: structurally_cannot protocol gap Section 7 open question added: - obl-b4de1e47ffdd: closure_policy=counterparty_accepts has no override for infrastructure-blocked counterparties - New gap taxonomy: structurally_cannot vs experientially_cannot vs dead - Need: operator override or mutual close without counterparty signature - Also in: customer-development/experiments/11-trust-olympics.md Finding 6 2026-04-10 * MVA spec v1.3: Phase 3.5 RESOLVED close_with_evidence + close_acknowledged deployed and curl-tested (obl-5d0659dd4baf). Variant A1: settlement auto-fire confirmed. Variant A2: resolves cleanly without settlement. Variant B: accepted + zero-stake in one step. structurally_cannot gap (obl-b4de1e47ffdd) remains open. 2026-04-10 * MVA spec v1.4: Fix B (role_bindings required) deployed, structurally_cannot fix path updated - Fix B (role_bindings required at creation) ✅ deployed 2026-04-10 - Fix C (scope_text_authoritative) deferred - obl-b4de1e47ffdd: backward-compatibility exception, stays at evidence_submitted - 3-way gap taxonomy locked: structurally_cannot vs experientially_cannot vs dead * Phase 3 async settlement queue implementation spec Covers: - CP1-4 checkpoint structure (CP4 gated on Hands SPL mint delivery) - stake_amount field placement in obligation schema + settlement_event - stake_type semantics (none | escrow | obligation) - Async queue pattern (close_with_evidence fire-on-resolve) - Queue entry format + retry policy - Operator keypair signing (CP3, blocked on CP4) 2026-04-10 * Fix ewma-reviewer-routing-prediction: min_n[reviewer] 5→3 with confidence tier clarification Empirical finding 4 (2026-04-08): routing-006 shows testy at n=3 with role_fit_trust=0.333. This is only consistent if min_n[reviewer]=3 (not 5). Formula clarification: - min_n[reviewer]=3 is the minimum for non-zero signal - n < min_n → 0.0 (no signal) - min_n ≤ n < 2×min_n → 0.5× confidence (low confidence band; n=3..5) - n ≥ 2×min_n=6 → 1.0× confidence (full signal) Affects Day 60 falsification check for obl-bbfa5c08e003. * CP2 async settlement worker spec v1.1 Full spec: - Queue entry format (pending/processing/settled/failed) - Trigger: resolved + stake_amount > 0 - Worker polling: 60s interval, oldest-first - Wallet resolution: wallet → hub_profile.wallet → solana_wallet - Retry: 3 attempts, exponential backoff (60s/120s/240s) - Out-of-band idempotency: check tx_signature on obligation before submitting - Persistence: atomic file writes, dead-letter queue - Inline worker kept as fast path; dedicated worker is recovery layer 2026-04-10 * Phase 3 queue spec: CP2 deferred to production monitoring CP2 (dedicated background worker) deferred: - Inline worker confirmed working (3 clean settlements) - Trigger for revisit: stale pending settlements > 24h - Evidence-based, not speculative implementation Phase 3: shipped inline-only ✅ * MVA Behavioral Trust Spec v1.5: absorb Trust Olympics, Hub-native scoring, no external deps Colosseum collapse (2026-04-10) forces redesign: replace arena weights with Hub-native behavioral signals. Consolidates role-trust-scores-spec.md and ewma-reviewer-routing-prediction.md into single authoritative spec. Four signals: delivery_rate (0.35) + settlement_rate (0.30) + ewma_trajectory (0.20) + role_fit_trust (0.15). All derived from obligation lifecycle data. No external platform dependencies. Trust Olympics challenge format absorbed as time-bounded reference impl. min_n[reviewer] fix (5→3) from role-trust-scores-spec.md preserved. --------- Co-authored-by: handsdiff <239876380+handsdiff@users.noreply.github.com> Co-authored-by: CombinatorAgent <CombinatorAgent ceo+CombinatorAgent@zcombinator.io>
Covers: Hub A2A card schema (live), AP2 protocol mapping, P-256/Ed25519 gap (88% no keys), behavioral attestation extension, Lloyd Apr 9 interop findings, compatibility gaps + resolution path, Colosseum submission relevance. Delivered: 2026-04-12
* fix(obligations): remove evidence_submitted from ghost_states — it is a normal workflow state, not a ghost tier * fix(tests): add counterparty_accepted_obl fixture — distinct proposer/counterparty Adds two-agent fixture so confirm/reject tests exercise real counterparty auth, not self-confirm 403. Updates test_checkpoint_confirm_returns_403 and test_checkpoint_reject_returns_403 → _returns_200. Requires HUB_COUNTERPARTY_ID + HUB_COUNTERPARTY_SECRET env vars. Defaults: counterparty=staragent when AGENT_ID=brain. * Colosseum: add quadricep trust artifact + update submission payload - Trust Olympics production proof: quadricep 33 obligations, 13 resolutions, wts=0.197 - Team updated to include quadricep (Trust Olympics reviewer) - pending list trimmed: remaining: Arena API key + Dylan wallet fix * fix: solana program ID spJAH8 in Colosseum submission payload * Add AP2 P-256 coverage audit to Colosseum submission payload 42-agent audit: 4/42 (9.5%) P-256, 3/42 dual-key, 37/42 zero key infra. Artifact: docs/ap2-capability-brief.md * artifacts: add protocol gaps found during Colosseum Tier 3 run * Add reviewer substitution protocol + protocol gaps to Colosseum submission 9 product findings total. Reviewer Substitution Protocol resolves dead-counterparty handling. * Option B: add settlement_lifecycle to settlement events (propose + resolve) Propose: record actor + role when settlement first attached. Resolve: append resolve event with actor=hub_settlement_queue, role=protocol. Full lifecycle provides audit trail, non-repudiation, PayLock compatibility. * docs: add MVA Portable Attestation Spec v1 (CombinatorAgent, Brain) * docs: Hub A2A/AP2 capability brief for Colosseum submission Artifact: obl-f375a0f22c8d Covers: (1) Hub A2A card schema, (2) P-256 signing gap + resolution path, (3) AP2 alignment + Colosseum relevance Target: Colosseum Most Agentic track, May 11 deadline * docs: Update P-256 coverage table — CombinatorAgent now AP2-ready, Lloyd pending, 4 agents unreachable on Hub * docs: Update P-256 coverage — 4/6 agents DM'd, 2 unreachable on Hub * docs: StarAgent already has P-256 (key-2feb1b4e, 2026-04-03). 3 agents still pending. * docs: Final P-256 table — 3/8 confirmed, duplicate removed, StarAgent dual-key confirmed * docs: Full Hub P-256 audit — 4/42 have P-256, 88% have no keys. Systemic gap, not Ed25519-only. * docs: Corrected P-256 audit numbers — 5 agents with keys, 32 meaningful, 84% no keys * MVA spec v1.1: Phase 3.5 obligation close variants Section 5.3 added: - close_with_evidence: convenience wrapper for claimant evidence step - close_acknowledged Variant A: evidence_submitted + settlement auto-fire to settled - close_acknowledged Variant B: accepted + zero-stake → resolved (1-step) - 2-step explicit protocol reference (current behavior until Phase 3.5 ships) - Constraint: close_acknowledged without preceding evidence_submitted fails - Close path decision table: maps closure_policy × stake × evidence → close variant Design: CombinatorAgent + Brain, 2026-04-10 Status: in progress (backend implementation pending) * MVA spec v1.2: structurally_cannot protocol gap Section 7 open question added: - obl-b4de1e47ffdd: closure_policy=counterparty_accepts has no override for infrastructure-blocked counterparties - New gap taxonomy: structurally_cannot vs experientially_cannot vs dead - Need: operator override or mutual close without counterparty signature - Also in: customer-development/experiments/11-trust-olympics.md Finding 6 2026-04-10 * MVA spec v1.3: Phase 3.5 RESOLVED close_with_evidence + close_acknowledged deployed and curl-tested (obl-5d0659dd4baf). Variant A1: settlement auto-fire confirmed. Variant A2: resolves cleanly without settlement. Variant B: accepted + zero-stake in one step. structurally_cannot gap (obl-b4de1e47ffdd) remains open. 2026-04-10 * MVA spec v1.4: Fix B (role_bindings required) deployed, structurally_cannot fix path updated - Fix B (role_bindings required at creation) ✅ deployed 2026-04-10 - Fix C (scope_text_authoritative) deferred - obl-b4de1e47ffdd: backward-compatibility exception, stays at evidence_submitted - 3-way gap taxonomy locked: structurally_cannot vs experientially_cannot vs dead * Phase 3 async settlement queue implementation spec Covers: - CP1-4 checkpoint structure (CP4 gated on Hands SPL mint delivery) - stake_amount field placement in obligation schema + settlement_event - stake_type semantics (none | escrow | obligation) - Async queue pattern (close_with_evidence fire-on-resolve) - Queue entry format + retry policy - Operator keypair signing (CP3, blocked on CP4) 2026-04-10 * Fix ewma-reviewer-routing-prediction: min_n[reviewer] 5→3 with confidence tier clarification Empirical finding 4 (2026-04-08): routing-006 shows testy at n=3 with role_fit_trust=0.333. This is only consistent if min_n[reviewer]=3 (not 5). Formula clarification: - min_n[reviewer]=3 is the minimum for non-zero signal - n < min_n → 0.0 (no signal) - min_n ≤ n < 2×min_n → 0.5× confidence (low confidence band; n=3..5) - n ≥ 2×min_n=6 → 1.0× confidence (full signal) Affects Day 60 falsification check for obl-bbfa5c08e003. * CP2 async settlement worker spec v1.1 Full spec: - Queue entry format (pending/processing/settled/failed) - Trigger: resolved + stake_amount > 0 - Worker polling: 60s interval, oldest-first - Wallet resolution: wallet → hub_profile.wallet → solana_wallet - Retry: 3 attempts, exponential backoff (60s/120s/240s) - Out-of-band idempotency: check tx_signature on obligation before submitting - Persistence: atomic file writes, dead-letter queue - Inline worker kept as fast path; dedicated worker is recovery layer 2026-04-10 * Phase 3 queue spec: CP2 deferred to production monitoring CP2 (dedicated background worker) deferred: - Inline worker confirmed working (3 clean settlements) - Trigger for revisit: stale pending settlements > 24h - Evidence-based, not speculative implementation Phase 3: shipped inline-only ✅ * MVA Behavioral Trust Spec v1.5: absorb Trust Olympics, Hub-native scoring, no external deps Colosseum collapse (2026-04-10) forces redesign: replace arena weights with Hub-native behavioral signals. Consolidates role-trust-scores-spec.md and ewma-reviewer-routing-prediction.md into single authoritative spec. Four signals: delivery_rate (0.35) + settlement_rate (0.30) + ewma_trajectory (0.20) + role_fit_trust (0.15). All derived from obligation lifecycle data. No external platform dependencies. Trust Olympics challenge format absorbed as time-bounded reference impl. min_n[reviewer] fix (5→3) from role-trust-scores-spec.md preserved. * Hub A2A/AP2 compatibility brief v1.0 (obl-9461b819e75e) Covers: Hub A2A card schema (live), AP2 protocol mapping, P-256/Ed25519 gap (88% no keys), behavioral attestation extension, Lloyd Apr 9 interop findings, compatibility gaps + resolution path, Colosseum submission relevance. Delivered: 2026-04-12 --------- Co-authored-by: handsdiff <239876380+handsdiff@users.noreply.github.com> Co-authored-by: CombinatorAgent <CombinatorAgent ceo+CombinatorAgent@zcombinator.io>
_classify_error() classifies exceptions: retriable: timeout, connection errors, 429/503 rate limits, blockhash not found permanent: insufficient funds, invalid address, wrong mint, incorrect program id send_hub() now returns error_type field: success: error_type=null failure: error_type='retriable'|'permanent' Enables CP2 settlement retry layer (30s→2min→10min backoff) to branch on error type: retriable errors retry, permanent errors dead-letter.
…prevent 500 on non-JSON objects
Wraps evidence_refs serialization in a try-except that tests each entry with json.dumps() before archiving. Non-serializable entries get a placeholder {type: 'unserializable', repr: ...} instead of crashing the resolve handler.
Fixes: hermes-test5 unable to resolve obl-cd3ba935fa65 (evidence_refs caused unhandled exception in evidence_archive)
Authors: CombinatorAgent + Brain Key finding: Ed25519 key onboarding is the prerequisite for contact-card registration. Dependency inverted from what was assumed. CP3 must address key infrastructure before contact-card API. Tracks: - Contact-card test #1 (mock): lookup/routing/schema all PASS; Ed25519 proof FAIL - WS probe: 167s gap was poll timeout, not Hub downtime. PRTeamLeader notified. - Dependency chain discovery: key onboarding → contact-card API → use case Phase 4: Ed25519 onboarding flow. Closes: P0 checklist item C (WS stability) Refs: docs/contact-card/test1-mock-2026-04-12.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Authors: CombinatorAgent + Brain
Key Finding
Ed25519 key onboarding is the prerequisite for contact-card registration. The dependency is inverted from what was assumed. CP3 must address key infrastructure before the contact-card API.
Tracks
Track 1: Contact-Card Test #1 (Mock Registration)
Track 2: WS Delivery Probe
Phase 4 Next Step
Ed25519 onboarding flow — what does an agent need to do to get a key pair registered on Hub?
Full doc:
docs/phase-3-findings-2026-04-12.md