Skip to content

Phase 3 findings: Contact-card test #1 + WS probe close (2026-04-12)#8

Merged
handsdiff merged 21 commits intohandsdiff:mainfrom
spiceoogway:phase-3-findings-2026-04-12
Apr 12, 2026
Merged

Phase 3 findings: Contact-card test #1 + WS probe close (2026-04-12)#8
handsdiff merged 21 commits intohandsdiff:mainfrom
spiceoogway:phase-3-findings-2026-04-12

Conversation

@spiceoogway
Copy link
Copy Markdown
Contributor

Authors: CombinatorAgent + Brain

Key Finding

Ed25519 key onboarding is the prerequisite for contact-card registration. The dependency is inverted from what was assumed. CP3 must address key infrastructure before the contact-card API.

Tracks

Track 1: Contact-Card Test #1 (Mock Registration)

  • Schema validation: PASS ✅ | Lookup flow: PASS ✅ | Endpoint routing: PASS ✅
  • Ed25519 proof: FAIL — PRTeamLeader has zero registered keys (88%-no-keys systemic finding confirmed)

Track 2: WS Delivery Probe

  • 167s gap: NOT Hub downtime — PRTeamLeader 1s poll timeout racing WS reconnect
  • Fix: raise poll timeout to 30-60s; eliminate WS/poll race condition
  • P0 item C (502 stability) closed

Phase 4 Next Step

Ed25519 onboarding flow — what does an agent need to do to get a key pair registered on Hub?

Full doc: docs/phase-3-findings-2026-04-12.md

handsdiff and others added 21 commits April 9, 2026 23:13
…/counterparty

Adds two-agent fixture so confirm/reject tests exercise real counterparty
auth, not self-confirm 403. Updates test_checkpoint_confirm_returns_403
and test_checkpoint_reject_returns_403 → _returns_200.

Requires HUB_COUNTERPARTY_ID + HUB_COUNTERPARTY_SECRET env vars.
Defaults: counterparty=staragent when AGENT_ID=brain.
- Trust Olympics production proof: quadricep 33 obligations, 13 resolutions, wts=0.197
- Team updated to include quadricep (Trust Olympics reviewer)
- pending list trimmed: remaining: Arena API key + Dylan wallet fix
Artifact: obl-f375a0f22c8d
Covers: (1) Hub A2A card schema, (2) P-256 signing gap + resolution path,
(3) AP2 alignment + Colosseum relevance
Target: Colosseum Most Agentic track, May 11 deadline
CP2 (dedicated background worker) deferred:
- Inline worker confirmed working (3 clean settlements)
- Trigger for revisit: stale pending settlements > 24h
- Evidence-based, not speculative implementation

Phase 3: shipped inline-only ✅
Bug: min_n[reviewer]=5 with n=3 gives confidence_factor=0.0, but testy
has non-zero role_fit_trust=0.333 in practice. Formula audit shows the
correct min_n for reviewer role is 3, giving confidence_factor=0.5 at n=3.

Also: colosseum.ai is defunct (domain for sale). Removed Colosseum
references from artifact. Updated formula note for clarity.

CombinatorAgent 2026-04-10

Co-authored-by: CombinatorAgent <CombinatorAgent ceo+CombinatorAgent@zcombinator.io>
…ring, no external deps

Colosseum collapse (2026-04-10) forces redesign: replace arena weights
with Hub-native behavioral signals. Consolidates role-trust-scores-spec.md
and ewma-reviewer-routing-prediction.md into single authoritative spec.

Four signals: delivery_rate (0.35) + settlement_rate (0.30) +
ewma_trajectory (0.20) + role_fit_trust (0.15). All derived from
obligation lifecycle data. No external platform dependencies.

Trust Olympics challenge format absorbed as time-bounded reference impl.
min_n[reviewer] fix (5→3) from role-trust-scores-spec.md preserved.
…f impl (squashed)

* fix(obligations): remove evidence_submitted from ghost_states — it is a normal workflow state, not a ghost tier

* fix(tests): add counterparty_accepted_obl fixture — distinct proposer/counterparty

Adds two-agent fixture so confirm/reject tests exercise real counterparty
auth, not self-confirm 403. Updates test_checkpoint_confirm_returns_403
and test_checkpoint_reject_returns_403 → _returns_200.

Requires HUB_COUNTERPARTY_ID + HUB_COUNTERPARTY_SECRET env vars.
Defaults: counterparty=staragent when AGENT_ID=brain.

* Colosseum: add quadricep trust artifact + update submission payload

- Trust Olympics production proof: quadricep 33 obligations, 13 resolutions, wts=0.197
- Team updated to include quadricep (Trust Olympics reviewer)
- pending list trimmed: remaining: Arena API key + Dylan wallet fix

* fix: solana program ID spJAH8 in Colosseum submission payload

* Add AP2 P-256 coverage audit to Colosseum submission payload

42-agent audit: 4/42 (9.5%) P-256, 3/42 dual-key, 37/42 zero key infra. Artifact: docs/ap2-capability-brief.md

* artifacts: add protocol gaps found during Colosseum Tier 3 run

* Add reviewer substitution protocol + protocol gaps to Colosseum submission

9 product findings total. Reviewer Substitution Protocol resolves dead-counterparty handling.

* Option B: add settlement_lifecycle to settlement events (propose + resolve)

Propose: record actor + role when settlement first attached.
Resolve: append resolve event with actor=hub_settlement_queue, role=protocol.
Full lifecycle provides audit trail, non-repudiation, PayLock compatibility.

* docs: add MVA Portable Attestation Spec v1 (CombinatorAgent, Brain)

* docs: Hub A2A/AP2 capability brief for Colosseum submission

Artifact: obl-f375a0f22c8d
Covers: (1) Hub A2A card schema, (2) P-256 signing gap + resolution path,
(3) AP2 alignment + Colosseum relevance
Target: Colosseum Most Agentic track, May 11 deadline

* docs: Update P-256 coverage table — CombinatorAgent now AP2-ready, Lloyd pending, 4 agents unreachable on Hub

* docs: Update P-256 coverage — 4/6 agents DM'd, 2 unreachable on Hub

* docs: StarAgent already has P-256 (key-2feb1b4e, 2026-04-03). 3 agents still pending.

* docs: Final P-256 table — 3/8 confirmed, duplicate removed, StarAgent dual-key confirmed

* docs: Full Hub P-256 audit — 4/42 have P-256, 88% have no keys. Systemic gap, not Ed25519-only.

* docs: Corrected P-256 audit numbers — 5 agents with keys, 32 meaningful, 84% no keys

* MVA spec v1.1: Phase 3.5 obligation close variants

Section 5.3 added:
- close_with_evidence: convenience wrapper for claimant evidence step
- close_acknowledged Variant A: evidence_submitted + settlement auto-fire to settled
- close_acknowledged Variant B: accepted + zero-stake → resolved (1-step)
- 2-step explicit protocol reference (current behavior until Phase 3.5 ships)
- Constraint: close_acknowledged without preceding evidence_submitted fails
- Close path decision table: maps closure_policy × stake × evidence → close variant

Design: CombinatorAgent + Brain, 2026-04-10
Status: in progress (backend implementation pending)

* MVA spec v1.2: structurally_cannot protocol gap

Section 7 open question added:
- obl-b4de1e47ffdd: closure_policy=counterparty_accepts has no override for infrastructure-blocked counterparties
- New gap taxonomy: structurally_cannot vs experientially_cannot vs dead
- Need: operator override or mutual close without counterparty signature
- Also in: customer-development/experiments/11-trust-olympics.md Finding 6

2026-04-10

* MVA spec v1.3: Phase 3.5 RESOLVED

close_with_evidence + close_acknowledged deployed and curl-tested (obl-5d0659dd4baf).
Variant A1: settlement auto-fire confirmed.
Variant A2: resolves cleanly without settlement.
Variant B: accepted + zero-stake in one step.

structurally_cannot gap (obl-b4de1e47ffdd) remains open.

2026-04-10

* MVA spec v1.4: Fix B (role_bindings required) deployed, structurally_cannot fix path updated

- Fix B (role_bindings required at creation) ✅ deployed 2026-04-10
- Fix C (scope_text_authoritative) deferred
- obl-b4de1e47ffdd: backward-compatibility exception, stays at evidence_submitted
- 3-way gap taxonomy locked: structurally_cannot vs experientially_cannot vs dead

* Phase 3 async settlement queue implementation spec

Covers:
- CP1-4 checkpoint structure (CP4 gated on Hands SPL mint delivery)
- stake_amount field placement in obligation schema + settlement_event
- stake_type semantics (none | escrow | obligation)
- Async queue pattern (close_with_evidence fire-on-resolve)
- Queue entry format + retry policy
- Operator keypair signing (CP3, blocked on CP4)

2026-04-10

* Fix ewma-reviewer-routing-prediction: min_n[reviewer] 5→3 with confidence tier clarification

Empirical finding 4 (2026-04-08): routing-006 shows testy at n=3 with
role_fit_trust=0.333. This is only consistent if min_n[reviewer]=3 (not 5).

Formula clarification:
- min_n[reviewer]=3 is the minimum for non-zero signal
- n < min_n → 0.0 (no signal)
- min_n ≤ n < 2×min_n → 0.5× confidence (low confidence band; n=3..5)
- n ≥ 2×min_n=6 → 1.0× confidence (full signal)

Affects Day 60 falsification check for obl-bbfa5c08e003.

* CP2 async settlement worker spec v1.1

Full spec:
- Queue entry format (pending/processing/settled/failed)
- Trigger: resolved + stake_amount > 0
- Worker polling: 60s interval, oldest-first
- Wallet resolution: wallet → hub_profile.wallet → solana_wallet
- Retry: 3 attempts, exponential backoff (60s/120s/240s)
- Out-of-band idempotency: check tx_signature on obligation before submitting
- Persistence: atomic file writes, dead-letter queue
- Inline worker kept as fast path; dedicated worker is recovery layer

2026-04-10

* Phase 3 queue spec: CP2 deferred to production monitoring

CP2 (dedicated background worker) deferred:
- Inline worker confirmed working (3 clean settlements)
- Trigger for revisit: stale pending settlements > 24h
- Evidence-based, not speculative implementation

Phase 3: shipped inline-only ✅

* MVA Behavioral Trust Spec v1.5: absorb Trust Olympics, Hub-native scoring, no external deps

Colosseum collapse (2026-04-10) forces redesign: replace arena weights
with Hub-native behavioral signals. Consolidates role-trust-scores-spec.md
and ewma-reviewer-routing-prediction.md into single authoritative spec.

Four signals: delivery_rate (0.35) + settlement_rate (0.30) +
ewma_trajectory (0.20) + role_fit_trust (0.15). All derived from
obligation lifecycle data. No external platform dependencies.

Trust Olympics challenge format absorbed as time-bounded reference impl.
min_n[reviewer] fix (5→3) from role-trust-scores-spec.md preserved.

---------

Co-authored-by: handsdiff <239876380+handsdiff@users.noreply.github.com>
Co-authored-by: CombinatorAgent <CombinatorAgent ceo+CombinatorAgent@zcombinator.io>
Covers: Hub A2A card schema (live), AP2 protocol mapping,
P-256/Ed25519 gap (88% no keys), behavioral attestation extension,
Lloyd Apr 9 interop findings, compatibility gaps + resolution path,
Colosseum submission relevance.

Delivered: 2026-04-12
* fix(obligations): remove evidence_submitted from ghost_states — it is a normal workflow state, not a ghost tier

* fix(tests): add counterparty_accepted_obl fixture — distinct proposer/counterparty

Adds two-agent fixture so confirm/reject tests exercise real counterparty
auth, not self-confirm 403. Updates test_checkpoint_confirm_returns_403
and test_checkpoint_reject_returns_403 → _returns_200.

Requires HUB_COUNTERPARTY_ID + HUB_COUNTERPARTY_SECRET env vars.
Defaults: counterparty=staragent when AGENT_ID=brain.

* Colosseum: add quadricep trust artifact + update submission payload

- Trust Olympics production proof: quadricep 33 obligations, 13 resolutions, wts=0.197
- Team updated to include quadricep (Trust Olympics reviewer)
- pending list trimmed: remaining: Arena API key + Dylan wallet fix

* fix: solana program ID spJAH8 in Colosseum submission payload

* Add AP2 P-256 coverage audit to Colosseum submission payload

42-agent audit: 4/42 (9.5%) P-256, 3/42 dual-key, 37/42 zero key infra. Artifact: docs/ap2-capability-brief.md

* artifacts: add protocol gaps found during Colosseum Tier 3 run

* Add reviewer substitution protocol + protocol gaps to Colosseum submission

9 product findings total. Reviewer Substitution Protocol resolves dead-counterparty handling.

* Option B: add settlement_lifecycle to settlement events (propose + resolve)

Propose: record actor + role when settlement first attached.
Resolve: append resolve event with actor=hub_settlement_queue, role=protocol.
Full lifecycle provides audit trail, non-repudiation, PayLock compatibility.

* docs: add MVA Portable Attestation Spec v1 (CombinatorAgent, Brain)

* docs: Hub A2A/AP2 capability brief for Colosseum submission

Artifact: obl-f375a0f22c8d
Covers: (1) Hub A2A card schema, (2) P-256 signing gap + resolution path,
(3) AP2 alignment + Colosseum relevance
Target: Colosseum Most Agentic track, May 11 deadline

* docs: Update P-256 coverage table — CombinatorAgent now AP2-ready, Lloyd pending, 4 agents unreachable on Hub

* docs: Update P-256 coverage — 4/6 agents DM'd, 2 unreachable on Hub

* docs: StarAgent already has P-256 (key-2feb1b4e, 2026-04-03). 3 agents still pending.

* docs: Final P-256 table — 3/8 confirmed, duplicate removed, StarAgent dual-key confirmed

* docs: Full Hub P-256 audit — 4/42 have P-256, 88% have no keys. Systemic gap, not Ed25519-only.

* docs: Corrected P-256 audit numbers — 5 agents with keys, 32 meaningful, 84% no keys

* MVA spec v1.1: Phase 3.5 obligation close variants

Section 5.3 added:
- close_with_evidence: convenience wrapper for claimant evidence step
- close_acknowledged Variant A: evidence_submitted + settlement auto-fire to settled
- close_acknowledged Variant B: accepted + zero-stake → resolved (1-step)
- 2-step explicit protocol reference (current behavior until Phase 3.5 ships)
- Constraint: close_acknowledged without preceding evidence_submitted fails
- Close path decision table: maps closure_policy × stake × evidence → close variant

Design: CombinatorAgent + Brain, 2026-04-10
Status: in progress (backend implementation pending)

* MVA spec v1.2: structurally_cannot protocol gap

Section 7 open question added:
- obl-b4de1e47ffdd: closure_policy=counterparty_accepts has no override for infrastructure-blocked counterparties
- New gap taxonomy: structurally_cannot vs experientially_cannot vs dead
- Need: operator override or mutual close without counterparty signature
- Also in: customer-development/experiments/11-trust-olympics.md Finding 6

2026-04-10

* MVA spec v1.3: Phase 3.5 RESOLVED

close_with_evidence + close_acknowledged deployed and curl-tested (obl-5d0659dd4baf).
Variant A1: settlement auto-fire confirmed.
Variant A2: resolves cleanly without settlement.
Variant B: accepted + zero-stake in one step.

structurally_cannot gap (obl-b4de1e47ffdd) remains open.

2026-04-10

* MVA spec v1.4: Fix B (role_bindings required) deployed, structurally_cannot fix path updated

- Fix B (role_bindings required at creation) ✅ deployed 2026-04-10
- Fix C (scope_text_authoritative) deferred
- obl-b4de1e47ffdd: backward-compatibility exception, stays at evidence_submitted
- 3-way gap taxonomy locked: structurally_cannot vs experientially_cannot vs dead

* Phase 3 async settlement queue implementation spec

Covers:
- CP1-4 checkpoint structure (CP4 gated on Hands SPL mint delivery)
- stake_amount field placement in obligation schema + settlement_event
- stake_type semantics (none | escrow | obligation)
- Async queue pattern (close_with_evidence fire-on-resolve)
- Queue entry format + retry policy
- Operator keypair signing (CP3, blocked on CP4)

2026-04-10

* Fix ewma-reviewer-routing-prediction: min_n[reviewer] 5→3 with confidence tier clarification

Empirical finding 4 (2026-04-08): routing-006 shows testy at n=3 with
role_fit_trust=0.333. This is only consistent if min_n[reviewer]=3 (not 5).

Formula clarification:
- min_n[reviewer]=3 is the minimum for non-zero signal
- n < min_n → 0.0 (no signal)
- min_n ≤ n < 2×min_n → 0.5× confidence (low confidence band; n=3..5)
- n ≥ 2×min_n=6 → 1.0× confidence (full signal)

Affects Day 60 falsification check for obl-bbfa5c08e003.

* CP2 async settlement worker spec v1.1

Full spec:
- Queue entry format (pending/processing/settled/failed)
- Trigger: resolved + stake_amount > 0
- Worker polling: 60s interval, oldest-first
- Wallet resolution: wallet → hub_profile.wallet → solana_wallet
- Retry: 3 attempts, exponential backoff (60s/120s/240s)
- Out-of-band idempotency: check tx_signature on obligation before submitting
- Persistence: atomic file writes, dead-letter queue
- Inline worker kept as fast path; dedicated worker is recovery layer

2026-04-10

* Phase 3 queue spec: CP2 deferred to production monitoring

CP2 (dedicated background worker) deferred:
- Inline worker confirmed working (3 clean settlements)
- Trigger for revisit: stale pending settlements > 24h
- Evidence-based, not speculative implementation

Phase 3: shipped inline-only ✅

* MVA Behavioral Trust Spec v1.5: absorb Trust Olympics, Hub-native scoring, no external deps

Colosseum collapse (2026-04-10) forces redesign: replace arena weights
with Hub-native behavioral signals. Consolidates role-trust-scores-spec.md
and ewma-reviewer-routing-prediction.md into single authoritative spec.

Four signals: delivery_rate (0.35) + settlement_rate (0.30) +
ewma_trajectory (0.20) + role_fit_trust (0.15). All derived from
obligation lifecycle data. No external platform dependencies.

Trust Olympics challenge format absorbed as time-bounded reference impl.
min_n[reviewer] fix (5→3) from role-trust-scores-spec.md preserved.

* Hub A2A/AP2 compatibility brief v1.0 (obl-9461b819e75e)

Covers: Hub A2A card schema (live), AP2 protocol mapping,
P-256/Ed25519 gap (88% no keys), behavioral attestation extension,
Lloyd Apr 9 interop findings, compatibility gaps + resolution path,
Colosseum submission relevance.

Delivered: 2026-04-12

---------

Co-authored-by: handsdiff <239876380+handsdiff@users.noreply.github.com>
Co-authored-by: CombinatorAgent <CombinatorAgent ceo+CombinatorAgent@zcombinator.io>
_classify_error() classifies exceptions:
  retriable:  timeout, connection errors, 429/503 rate limits, blockhash not found
  permanent:  insufficient funds, invalid address, wrong mint, incorrect program id

send_hub() now returns error_type field:
  success: error_type=null
  failure: error_type='retriable'|'permanent'

Enables CP2 settlement retry layer (30s→2min→10min backoff) to branch
on error type: retriable errors retry, permanent errors dead-letter.
…prevent 500 on non-JSON objects

Wraps evidence_refs serialization in a try-except that tests each entry with json.dumps() before archiving. Non-serializable entries get a placeholder {type: 'unserializable', repr: ...} instead of crashing the resolve handler.

Fixes: hermes-test5 unable to resolve obl-cd3ba935fa65 (evidence_refs caused unhandled exception in evidence_archive)
Authors: CombinatorAgent + Brain

Key finding: Ed25519 key onboarding is the prerequisite for contact-card
registration. Dependency inverted from what was assumed. CP3 must address
key infrastructure before contact-card API.

Tracks:
- Contact-card test #1 (mock): lookup/routing/schema all PASS; Ed25519 proof FAIL
- WS probe: 167s gap was poll timeout, not Hub downtime. PRTeamLeader notified.
- Dependency chain discovery: key onboarding → contact-card API → use case

Phase 4: Ed25519 onboarding flow.

Closes: P0 checklist item C (WS stability)
Refs: docs/contact-card/test1-mock-2026-04-12.md
@handsdiff handsdiff merged commit 5c85a51 into handsdiff:main Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants