Skip to content

Phase 1-2 Implementation: Multi-subreddit platform with agents, tools, and synthesis#2

Merged
sunitj merged 25 commits intomainfrom
claude/agent-social-platform-8YF93
Feb 13, 2026
Merged

Phase 1-2 Implementation: Multi-subreddit platform with agents, tools, and synthesis#2
sunitj merged 25 commits intomainfrom
claude/agent-social-platform-8YF93

Conversation

@sunitj
Copy link
Copy Markdown
Owner

@sunitj sunitj commented Feb 13, 2026

Summary

This PR implements Phase 1-2 of the Emergent Deliberation Platform specification, transforming Colloquip from a single-thread deliberation engine into a multi-subreddit social platform for AI expert panels. Scientists can now submit hypotheses to topic-specific communities and receive structured, multi-perspective assessments with cited evidence in 10-15 minutes.

Key Changes

Platform Architecture

  • Subreddit system: Communities with configurable participation models, output templates, and agent pools
  • Agent registry: Global pool of persistent agents with expertise-based recruitment; new agents created only when expertise gaps exist
  • 10 curated personas: Medicinal chemistry, clinical development, regulatory affairs, ADMET, computational biology, molecular biology, protein engineering, synthetic biology, and two red team variants (general + biology-specific)

Synthesis & Output

  • Template-driven synthesis: Four thinking types (Assessment, Review, Analysis, Ideation) with structured sections
  • Audit chains: Every claim linked to source posts and citations for transparency
  • Citation verification: Automated PubMed PMID validation with fallback to manual review

Tools & Evidence

  • PubMed search: NCBI E-utilities integration for literature discovery
  • Company docs search: Internal documentation retrieval
  • Web/academic search: Semantic Scholar API integration
  • Tool registry: Per-subreddit tool configuration and availability

Cost & Governance

  • Cost tracking: Per-thread token usage and USD cost estimation with budget kill switches
  • Human participation models: Explicit, implicit, and none modes for scientist involvement
  • Audit trails: Complete record of agent contributions, phase transitions, and synthesis decisions

Memory & Watchers (Phase 3-4 Foundation)

  • Synthesis-level memory: Store and retrieve past deliberations via vector similarity (mock embeddings for now)
  • Watcher infrastructure: Event detection (literature, scheduled, webhook) with triage pipeline
  • Notification system: Structured notifications from watcher events with action tracking
  • Cross-reference detection: Identify when findings in one subreddit are relevant to another

API & Infrastructure

  • Platform routes: /api/subreddits, /api/agents, /api/threads for community management
  • Memory routes: /api/memory for retrieval and annotation
  • Watcher routes: /api/watchers for event management
  • Feedback routes: /api/feedback for outcome tracking and agent calibration
  • Export routes: JSON/CSV export of deliberation results
  • External API: Programmatic hypothesis submission and polling

Database & Deployment

  • Alembic migrations: Four-phase schema evolution (baseline, Phase 3 memory, Phase 4 watchers, Phase 5 cross-refs)
  • PostgreSQL support: pgvector-ready schema for future embedding storage
  • Docker infrastructure: Multi-stage builds, dev/monitoring compose overrides, health checks
  • Monitoring: Prometheus metrics, Grafana dashboards, structured JSON logging
  • Configuration: Environment-based settings with production/staging/dev overrides

Testing

  • Comprehensive test suite: 313+ tests covering routes, tools, memory, watchers, feedback, and integration scenarios
  • Test strategy document: Guidelines for coverage targets and conventions
  • Mock implementations: In-memory stores and mock embedding provider for isolated testing

Implementation Details

  • No breaking changes: Existing deliberation engine (observer, energy, triggers) remains unchanged; new platform layer wraps it
  • Backward compatible: SQLite still supported for development; PostgreSQL optional for production
  • Incremental design: Phase 3-5 features (memory decay, event-driven triggers, cross-references) designed to plug in without refactoring core
  • Configuration-driven: Retrieval limits, decay rates, triage thresholds all externalized to config files
  • Async throughout: FastAPI + asyncio for all I/O; WebSocket streaming for real-time deliberation viewing

Files Added/Modified

  • Core platform: registry.py, synthesis.py, platform_manager.py, output_templates.py
  • Tools: pubmed.py, company_docs.py, web_search.py,

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

claude and others added 25 commits February 11, 2026 18:01
Describes the architecture for evolving Colloquip from a single-thread
deliberation engine into a Reddit-like agent social platform with:
- Subreddits (communities defining agent types)
- Persistent agent identities across sessions
- Agent memory/learning system (post-deliberation extraction + recall)
- Cross-subreddit membership
- New API surface for communities, agents, and memories

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
Three major additions to the social platform plan:
- Agent pool/registry: agents selected from existing pool, new ones
  created only when no matching expertise exists
- Mandatory red team: every subreddit always has at least one
  topic-specific red team agent (cannot be removed)
- Literature search tools: PubMed, company docs, and web search
  via Anthropic's native tool-use API, configured per subreddit

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
Comprehensive plan to incrementally transform the working deliberation
engine into the full social platform described in the Phase 1-2 and
Phase 3-5 specs, without a ground-up rewrite.

Key decisions:
- Keep SQLite + SQLAlchemy (PostgreSQL deferred to Phase 3 for pgvector)
- Keep src/colloquip/ structure (not backend/app/)
- 10 curated personas: 8 from spec + protein engineering + synthetic biology
- 7 sprints: models → registry → tools → synthesis → prompts → API → integration
- All 181 existing tests must pass at every step
- Phase 3+ hooks designed now, built later

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
Evolve the Colloquip deliberation system toward a Reddit-like social platform
for AI agents, implementing Phases 1-2 of the implementation spec.

Core additions:
- 10 curated YAML agent personas (molecular biology, medicinal chemistry,
  ADMET, clinical, regulatory, computational biology, protein engineering,
  synthetic biology, 2 red team agents) with weighted evaluation criteria
  and phase-specific mandates
- Agent registry with expertise-based recruitment scoring, find-or-create
  pattern, and mandatory red team enforcement per subreddit
- Tool system: PubMed (NCBI E-utilities), company docs (local search),
  web search (Semantic Scholar), citation verifier — all with mock
  implementations for testing
- 4 structured output templates (Assessment, Review, Analysis, Ideation)
  with named sections and metadata fields
- Template-driven synthesis generator with audit chains linking claims
  to posts and citations
- Per-thread cost tracking with budget enforcement
- Prompt builder v3: layered assembly (persona -> subreddit context ->
  role -> phase mandate -> citation/tool instructions)
- Platform manager orchestrating subreddit creation and agent recruitment
- REST API: subreddit CRUD, agent listing, thread creation, cost endpoints
- Extended DB schema: subreddits, agent identities, memberships,
  synthesis, cost records

All 188 existing tests pass unchanged. 111 new tests added (299 total).

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
Round 1 (Critical/High):
- synthesis: filter stop words in audit chain matching, raise overlap
  threshold from 0.2 to 0.3 to prevent false claim-post links
- synthesis: fix _parse_metadata to only match lines starting with
  field names, preventing false matches inside section prose
- tools/registry: handle None tool_configs gracefully
- persona_loader: validate non-empty expertise_tags and domain_keywords
- pubmed: pass email to efetch requests (NCBI API compliance)
- citation_verifier: use actual error from _verify_pmid in flagged detail
- cost_tracker: remove unused uuid4 import
- synthesis: remove unused StructuredCitation import
- platform_manager: simplify thread storage with setdefault

Round 2 (High):
- prompts: fix tool_descriptions to accept Union[str, List[str]],
  join list items for proper formatting in prompt
- synthesis: rewrite _parse_synthesis_sections to use exact heading
  matches (longest-first sort) preventing partial name collisions
- platform_routes: validate _initialized flag in _get_platform helper
- platform_routes: add UUID format validation on thread cost endpoint

Round 3 (High):
- registry: add max_agents parameter to recruit_for_subreddit,
  reserve slot for red team when recruiting optional expertise
- platform_manager: wire CostTracker into get_thread_costs instead
  of returning placeholder zeros

Round 4 (Medium/Low):
- synthesis: replace chr(10) with readable string formatting
- synthesis: move uuid import to module level, add type hint
- synthesis: add fallback validation for empty raw_text
- output_templates: add descriptive ValueError for missing template
- prompts: use proper Union type annotation instead of string literal
- web_search: simplify redundant fallback on externalIds
- pubmed: add null-safety on XML element .text access
- company_docs: log when file content is truncated for search
- platform_routes: add TYPE_CHECKING imports and type hints on helpers
- registry: downgrade duplicate agent_type log from warning to debug
- persona_loader: wrap persona_to_agent_identity in try/except for
  descriptive KeyError messages

All 299 tests pass.

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
Bug fix:
- Fix enum string comparison in synthesis audit chains
  (post.stance.value == "critical" → post.stance == AgentStance.CRITICAL)

Principle #2 (testable without LLM):
- Extract parse_synthesis() standalone function from SynthesisGenerator
- Parsing logic now testable directly without LLM calls

Principle #3 (interfaces first):
- Replace AgentTool Protocol with ABC BaseSearchTool
- tool_schema and execute() are now @AbstractMethod
- Keep AgentTool as backward-compatible alias

Principle #4 (configuration-driven):
- Add ScoringWeights dataclass for configurable expertise matching
- Make audit chain params configurable (max_chains, overlap_threshold,
  min_claim_words)

Principle #5 (minimal dependencies):
- Mock tool classes now inherit from real classes (PubMedTool,
  WebSearchTool, CompanyDocsTool), eliminating duplicated tool_schema
  properties
- Convert VerificationReport to Pydantic BaseModel for consistency

Simplification:
- Extract _subreddit_common() helper to DRY response builders
- 14 new tests (313 total, all passing)

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
Comprehensive plan covering 13 feature sprints and 5 infrastructure sprints:
- Phase 3: Institutional memory (embedding interface, synthesis RAG, pgvector, human corrections)
- Phase 4: Event-driven triggers (watchers, triage agent, notifications, auto-deliberation)
- Phase 5: Cross-subreddit references, outcome tracking, agent calibration, export/external API
- Deployment: Docker multi-stage builds, docker-compose (Postgres+Redis), CI/CD pipelines,
  Alembic migrations, production config, Prometheus monitoring

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
Add embedding infrastructure (mock + OpenAI providers), in-memory
vector store with cosine similarity search, synthesis-to-memory
extraction pipeline, RAG prompt integration, human memory corrections
via annotations, and API routes for memory management.

Key components:
- EmbeddingProvider ABC with MockEmbeddingProvider and OpenAIEmbeddingProvider
- MemoryStore ABC with InMemoryStore (brute-force cosine search)
- SynthesisMemoryExtractor: pure text parsing, no LLM calls
- MemoryRetriever: arena-scoped + cross-subreddit retrieval
- Memory annotations: outdated, correction, confirmed, context types
- API routes: /api/memories, /api/memories/{id}/annotate
- DB tables: synthesis_memories, memory_annotations
- Phase 3b typed memory models (reserved for future use)

Review fixes applied:
- UUID validation in API routes with proper 400 responses
- list_all() used instead of private _memories access
- ValueError (not KeyError) for missing memories in annotate()
- IndexError safety check in annotation response
- Missing template_type update in repository save
- OpenAI API error handling with logging

104 new tests (313 → 417), all passing.

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
Add watcher infrastructure for monitoring external sources, triage
agent for evaluating event relevance, notification system, and
auto-deliberation policy with earned automation.

Key components:
- BaseWatcher ABC with WatcherRegistry for managing watcher instances
- LiteratureWatcher: PubMed monitoring with PMID deduplication
- ScheduledWatcher: time-based triggers with interval/day/hour constraints
- WebhookWatcher: external event ingestion via HTTP POST
- WatcherManager: async polling loop with error isolation per watcher
- MockTriageAgent: keyword heuristic triage (novelty/relevance/urgency)
- InMemoryNotificationStore: notification CRUD with status tracking
- AutoDeliberationPolicy: earned automation (20+ events, >70% useful,
  human approval, rate limits, budget sharing)
- API routes: watchers CRUD, notifications, webhook endpoint
- DB tables: watchers, watcher_events, notifications
- Models: WatcherType, WatcherEvent, TriageDecision, Notification

81 new tests (417 → 498), all passing.

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
- Fix list_subreddit_watchers: remove 'or True' filter bug, use
  subreddit_name stored in watcher config for proper filtering
- Fix create_watcher: inject pubmed_tool from app state for
  LiteratureWatcher creation so watchers are functional
- Store subreddit_name in watcher config dict for lookup
- Add source_metadata column to DBWatcherEvent to prevent data loss
- Add defensive None check for PubMed tool result

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
…ts 18-20)

- Cross-subreddit reference detection with triple criteria (similarity + shared entities + actionability)
- Entity extraction (PMIDs, gene names, compound IDs) for cross-reference matching
- Deliberation differ for comparing syntheses over time
- Outcome tracking system for real-world result recording
- Agent calibration with accuracy, domain-specific metrics, and bias detection
- External API with API key authentication for programmatic access
- Export system (Markdown and JSON formats)
- DB tables for cross-references and outcome reports
- 41 new tests (539 total), all passing

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
- Register all Phase 3-5 route modules (memory, watcher, export, external,
  feedback) in FastAPI app — previously all these endpoints returned 404
- Fix self-referential comparison in calibration bias detection: compare
  domain accuracy against overall accuracy instead of itself

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
D1 - Containerization:
  - Multi-stage Dockerfile (Python deps, Node frontend, production image)
  - docker-compose.yml with app, PostgreSQL 16 + pgvector, Redis
  - docker-compose.dev.yml with hot-reload and dev tools
  - .dockerignore for clean builds

D2 - CI/CD Pipeline:
  - GitHub Actions: ci.yml (lint + test + build), deploy.yml (GHCR push)
  - db-migration.yml for PR migration validation
  - Python 3.11/3.12 matrix, codecov integration

D3 - Production Configuration:
  - Settings module with env var validation (database, LLM, embedding, memory, watchers, deployment)
  - Structured JSON logging for production, text for development
  - Request ID tracking, sensitive field redaction
  - Production and staging YAML config overrides
  - .env.example with documented variables

D4 - Database Migrations:
  - Alembic setup with env.py and migration template
  - 4 migration files: baseline, Phase 3 memory, Phase 4 watchers, Phase 5 cross-refs
  - All migrations reversible with downgrade()

D5 - Monitoring:
  - Prometheus metrics: deliberations, cost, memory retrieval, watchers, LLM usage
  - /api/metrics endpoint
  - docker-compose.monitoring.yml with Prometheus + Grafana
  - Pre-built Grafana dashboard

26 new tests (565 total), all passing.

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
- Fix 138 ruff lint errors: remove unused imports (F401), fix line length
  (E501), rename ambiguous variables (E741), prefix unused vars (F841)
- Auto-format all 83 files with ruff format
- Configure ruff rules in pyproject.toml: select E/F/I, ignore E402,
  per-file ignores for tests (F841) and forward refs (F821)
- Add pre-commit hook: ruff check + ruff format --check + pytest (fast)
- Add install-hooks.sh for easy setup

All 565 tests pass. ruff check and ruff format --check both clean.

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
- Commit updated uv.lock (was stale, missing new dependencies from
  Phases 3-5 which caused --frozen sync failures in CI)
- Use actions/setup-python@v5 for Python version matrix instead of
  passing python-version to setup-uv (unsupported parameter)
- Add --frozen to all uv sync calls for reproducible CI builds
- Remove Docker build job from CI (requires Docker daemon; deploy
  workflow handles real builds on tag push)
- Add coverage.xml to .gitignore

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
Tests used absolute /home/user/Colloquip/ paths that only work locally.
Replaced with paths computed relative to the project root using
os.path.dirname(os.path.abspath(__file__)), which works on any CI runner.

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
- tests/TEST_STRATEGY.md: Test conventions, fixtures, patterns, and checklist
  for writing effective tests (reference before writing new tests)
- tests/test_api_routes.py: 51 tests covering all route handlers (export,
  external, feedback, memory, watcher routes) with happy paths, validation,
  404/400/503 error paths
- tests/test_tools.py: 26 tests for citation_verifier, company_docs,
  web_search tools with mocked external APIs
- tests/test_infrastructure.py: 18 tests for db/engine, display, CLI

Key coverage improvements:
  export_routes: 20% → 95%+
  external_routes: 51% → 95%+
  feedback_routes: 62% → 95%+
  memory_routes: 51% → 95%+
  watcher_routes: 45% → 95%
  citation_verifier: 47% → 95%
  company_docs: 31% → 90%
  web_search: 43% → 95%
  db/engine: 38% → 100%

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
Replace the static string confidence_level with Beta distribution
parameters (alpha, beta) on SynthesisMemory. Retrieval scoring becomes
similarity * confidence * decay instead of similarity-only.

Key changes:
- SynthesisMemory carries confidence_alpha/confidence_beta fields
  initialized from synthesis metadata (high→3:1, moderate→2:1.5, low→1:2)
- compute_confidence() returns clamped posterior mean [0.10, 0.95]
- temporal_decay() applies exponential decay with 120-day half-life
- composite_score() multiplies similarity × confidence × decay
- Annotations auto-update confidence: confirmed +2α, correction +3β,
  outdated +2β, context no change
- Outcome reports (confirmed/contradicted) update linked memory confidence
- Retrieval logging records every memory retrieval with similarity,
  confidence, decay factor, and composite score for future calibration
- DB schema, repository, and API responses updated for new fields
- 47 new tests covering all Bayesian math, decay, scoring, annotation
  wiring, retrieval logging, and prompt formatting

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo
Full frontend rebuild with TanStack Router/Query, Zustand, Tailwind v4,
and shadcn-style primitives. Includes route structure for communities,
threads, agents, memories, notifications, and settings. Migrates existing
deliberation components with Tailwind restyling. Fixes Memory type to
match Bayesian confidence model (confidence/alpha/beta fields, correct
citations_used and confidence_level types).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename App.css -> app.css to match the import in main.tsx on
case-sensitive filesystems. Remove unused legacy index.css.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wrap UI primitives (Button, Card, Badge, Dialog, Tooltip, Skeleton) with
  HeroUI v3 beta components for consistent styling and accessibility
- Fix Dialog component: conditionally render Modal to prevent backdrop from
  blocking pointer events when closed
- Dockerfile: add alembic/ and alembic.ini to production image for migrations
- Dockerfile.dev: replace broad COPY with targeted copies, use uv pip install
- docker-compose.yml: add start_period to app healthcheck
- docker-compose.dev.yml: mount alembic directory for dev migrations
- Rewrite web/README.md to document actual tech stack (HeroUI, TanStack,
  TailwindCSS v4, Zustand) replacing default Vite boilerplate

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The alembic package is in the db-pg optional dependency group, not db.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SQLite does not support ALTER-based create_unique_constraint. Move the
three constraints (consensus_maps, subreddit_memberships, syntheses)
into their respective create_table calls as sa.UniqueConstraint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sunitj sunitj merged commit fd37797 into main Feb 13, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants