Phase 1-2 Implementation: Multi-subreddit platform with agents, tools, and synthesis by sunitj · Pull Request #2 · sunitj/Colloquip

sunitj · 2026-02-13T16:37:50Z

Summary

This PR implements Phase 1-2 of the Emergent Deliberation Platform specification, transforming Colloquip from a single-thread deliberation engine into a multi-subreddit social platform for AI expert panels. Scientists can now submit hypotheses to topic-specific communities and receive structured, multi-perspective assessments with cited evidence in 10-15 minutes.

Key Changes

Platform Architecture

Subreddit system: Communities with configurable participation models, output templates, and agent pools
Agent registry: Global pool of persistent agents with expertise-based recruitment; new agents created only when expertise gaps exist
10 curated personas: Medicinal chemistry, clinical development, regulatory affairs, ADMET, computational biology, molecular biology, protein engineering, synthetic biology, and two red team variants (general + biology-specific)

Synthesis & Output

Template-driven synthesis: Four thinking types (Assessment, Review, Analysis, Ideation) with structured sections
Audit chains: Every claim linked to source posts and citations for transparency
Citation verification: Automated PubMed PMID validation with fallback to manual review

Tools & Evidence

PubMed search: NCBI E-utilities integration for literature discovery
Company docs search: Internal documentation retrieval
Web/academic search: Semantic Scholar API integration
Tool registry: Per-subreddit tool configuration and availability

Cost & Governance

Cost tracking: Per-thread token usage and USD cost estimation with budget kill switches
Human participation models: Explicit, implicit, and none modes for scientist involvement
Audit trails: Complete record of agent contributions, phase transitions, and synthesis decisions

Memory & Watchers (Phase 3-4 Foundation)

Synthesis-level memory: Store and retrieve past deliberations via vector similarity (mock embeddings for now)
Watcher infrastructure: Event detection (literature, scheduled, webhook) with triage pipeline
Notification system: Structured notifications from watcher events with action tracking
Cross-reference detection: Identify when findings in one subreddit are relevant to another

API & Infrastructure

Platform routes: /api/subreddits, /api/agents, /api/threads for community management
Memory routes: /api/memory for retrieval and annotation
Watcher routes: /api/watchers for event management
Feedback routes: /api/feedback for outcome tracking and agent calibration
Export routes: JSON/CSV export of deliberation results
External API: Programmatic hypothesis submission and polling

Database & Deployment

Alembic migrations: Four-phase schema evolution (baseline, Phase 3 memory, Phase 4 watchers, Phase 5 cross-refs)
PostgreSQL support: pgvector-ready schema for future embedding storage
Docker infrastructure: Multi-stage builds, dev/monitoring compose overrides, health checks
Monitoring: Prometheus metrics, Grafana dashboards, structured JSON logging
Configuration: Environment-based settings with production/staging/dev overrides

Testing

Comprehensive test suite: 313+ tests covering routes, tools, memory, watchers, feedback, and integration scenarios
Test strategy document: Guidelines for coverage targets and conventions
Mock implementations: In-memory stores and mock embedding provider for isolated testing

Implementation Details

No breaking changes: Existing deliberation engine (observer, energy, triggers) remains unchanged; new platform layer wraps it
Backward compatible: SQLite still supported for development; PostgreSQL optional for production
Incremental design: Phase 3-5 features (memory decay, event-driven triggers, cross-references) designed to plug in without refactoring core
Configuration-driven: Retrieval limits, decay rates, triage thresholds all externalized to config files
Async throughout: FastAPI + asyncio for all I/O; WebSocket streaming for real-time deliberation viewing

Files Added/Modified

Core platform: registry.py, synthesis.py, platform_manager.py, output_templates.py
Tools: pubmed.py, company_docs.py, web_search.py,

https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

Describes the architecture for evolving Colloquip from a single-thread deliberation engine into a Reddit-like agent social platform with: - Subreddits (communities defining agent types) - Persistent agent identities across sessions - Agent memory/learning system (post-deliberation extraction + recall) - Cross-subreddit membership - New API surface for communities, agents, and memories https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

Three major additions to the social platform plan: - Agent pool/registry: agents selected from existing pool, new ones created only when no matching expertise exists - Mandatory red team: every subreddit always has at least one topic-specific red team agent (cannot be removed) - Literature search tools: PubMed, company docs, and web search via Anthropic's native tool-use API, configured per subreddit https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

Comprehensive plan to incrementally transform the working deliberation engine into the full social platform described in the Phase 1-2 and Phase 3-5 specs, without a ground-up rewrite. Key decisions: - Keep SQLite + SQLAlchemy (PostgreSQL deferred to Phase 3 for pgvector) - Keep src/colloquip/ structure (not backend/app/) - 10 curated personas: 8 from spec + protein engineering + synthetic biology - 7 sprints: models → registry → tools → synthesis → prompts → API → integration - All 181 existing tests must pass at every step - Phase 3+ hooks designed now, built later https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

Evolve the Colloquip deliberation system toward a Reddit-like social platform for AI agents, implementing Phases 1-2 of the implementation spec. Core additions: - 10 curated YAML agent personas (molecular biology, medicinal chemistry, ADMET, clinical, regulatory, computational biology, protein engineering, synthetic biology, 2 red team agents) with weighted evaluation criteria and phase-specific mandates - Agent registry with expertise-based recruitment scoring, find-or-create pattern, and mandatory red team enforcement per subreddit - Tool system: PubMed (NCBI E-utilities), company docs (local search), web search (Semantic Scholar), citation verifier — all with mock implementations for testing - 4 structured output templates (Assessment, Review, Analysis, Ideation) with named sections and metadata fields - Template-driven synthesis generator with audit chains linking claims to posts and citations - Per-thread cost tracking with budget enforcement - Prompt builder v3: layered assembly (persona -> subreddit context -> role -> phase mandate -> citation/tool instructions) - Platform manager orchestrating subreddit creation and agent recruitment - REST API: subreddit CRUD, agent listing, thread creation, cost endpoints - Extended DB schema: subreddits, agent identities, memberships, synthesis, cost records All 188 existing tests pass unchanged. 111 new tests added (299 total). https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

Round 1 (Critical/High): - synthesis: filter stop words in audit chain matching, raise overlap threshold from 0.2 to 0.3 to prevent false claim-post links - synthesis: fix _parse_metadata to only match lines starting with field names, preventing false matches inside section prose - tools/registry: handle None tool_configs gracefully - persona_loader: validate non-empty expertise_tags and domain_keywords - pubmed: pass email to efetch requests (NCBI API compliance) - citation_verifier: use actual error from _verify_pmid in flagged detail - cost_tracker: remove unused uuid4 import - synthesis: remove unused StructuredCitation import - platform_manager: simplify thread storage with setdefault Round 2 (High): - prompts: fix tool_descriptions to accept Union[str, List[str]], join list items for proper formatting in prompt - synthesis: rewrite _parse_synthesis_sections to use exact heading matches (longest-first sort) preventing partial name collisions - platform_routes: validate _initialized flag in _get_platform helper - platform_routes: add UUID format validation on thread cost endpoint Round 3 (High): - registry: add max_agents parameter to recruit_for_subreddit, reserve slot for red team when recruiting optional expertise - platform_manager: wire CostTracker into get_thread_costs instead of returning placeholder zeros Round 4 (Medium/Low): - synthesis: replace chr(10) with readable string formatting - synthesis: move uuid import to module level, add type hint - synthesis: add fallback validation for empty raw_text - output_templates: add descriptive ValueError for missing template - prompts: use proper Union type annotation instead of string literal - web_search: simplify redundant fallback on externalIds - pubmed: add null-safety on XML element .text access - company_docs: log when file content is truncated for search - platform_routes: add TYPE_CHECKING imports and type hints on helpers - registry: downgrade duplicate agent_type log from warning to debug - persona_loader: wrap persona_to_agent_identity in try/except for descriptive KeyError messages All 299 tests pass. https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

@AbstractMethod

Bug fix: - Fix enum string comparison in synthesis audit chains (post.stance.value == "critical" → post.stance == AgentStance.CRITICAL) Principle #2 (testable without LLM): - Extract parse_synthesis() standalone function from SynthesisGenerator - Parsing logic now testable directly without LLM calls Principle #3 (interfaces first): - Replace AgentTool Protocol with ABC BaseSearchTool - tool_schema and execute() are now @AbstractMethod - Keep AgentTool as backward-compatible alias Principle #4 (configuration-driven): - Add ScoringWeights dataclass for configurable expertise matching - Make audit chain params configurable (max_chains, overlap_threshold, min_claim_words) Principle #5 (minimal dependencies): - Mock tool classes now inherit from real classes (PubMedTool, WebSearchTool, CompanyDocsTool), eliminating duplicated tool_schema properties - Convert VerificationReport to Pydantic BaseModel for consistency Simplification: - Extract _subreddit_common() helper to DRY response builders - 14 new tests (313 total, all passing) https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

Comprehensive plan covering 13 feature sprints and 5 infrastructure sprints: - Phase 3: Institutional memory (embedding interface, synthesis RAG, pgvector, human corrections) - Phase 4: Event-driven triggers (watchers, triage agent, notifications, auto-deliberation) - Phase 5: Cross-subreddit references, outcome tracking, agent calibration, export/external API - Deployment: Docker multi-stage builds, docker-compose (Postgres+Redis), CI/CD pipelines, Alembic migrations, production config, Prometheus monitoring https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

Add embedding infrastructure (mock + OpenAI providers), in-memory vector store with cosine similarity search, synthesis-to-memory extraction pipeline, RAG prompt integration, human memory corrections via annotations, and API routes for memory management. Key components: - EmbeddingProvider ABC with MockEmbeddingProvider and OpenAIEmbeddingProvider - MemoryStore ABC with InMemoryStore (brute-force cosine search) - SynthesisMemoryExtractor: pure text parsing, no LLM calls - MemoryRetriever: arena-scoped + cross-subreddit retrieval - Memory annotations: outdated, correction, confirmed, context types - API routes: /api/memories, /api/memories/{id}/annotate - DB tables: synthesis_memories, memory_annotations - Phase 3b typed memory models (reserved for future use) Review fixes applied: - UUID validation in API routes with proper 400 responses - list_all() used instead of private _memories access - ValueError (not KeyError) for missing memories in annotate() - IndexError safety check in annotation response - Missing template_type update in repository save - OpenAI API error handling with logging 104 new tests (313 → 417), all passing. https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

Add watcher infrastructure for monitoring external sources, triage agent for evaluating event relevance, notification system, and auto-deliberation policy with earned automation. Key components: - BaseWatcher ABC with WatcherRegistry for managing watcher instances - LiteratureWatcher: PubMed monitoring with PMID deduplication - ScheduledWatcher: time-based triggers with interval/day/hour constraints - WebhookWatcher: external event ingestion via HTTP POST - WatcherManager: async polling loop with error isolation per watcher - MockTriageAgent: keyword heuristic triage (novelty/relevance/urgency) - InMemoryNotificationStore: notification CRUD with status tracking - AutoDeliberationPolicy: earned automation (20+ events, >70% useful, human approval, rate limits, budget sharing) - API routes: watchers CRUD, notifications, webhook endpoint - DB tables: watchers, watcher_events, notifications - Models: WatcherType, WatcherEvent, TriageDecision, Notification 81 new tests (417 → 498), all passing. https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

- Fix list_subreddit_watchers: remove 'or True' filter bug, use subreddit_name stored in watcher config for proper filtering - Fix create_watcher: inject pubmed_tool from app state for LiteratureWatcher creation so watchers are functional - Store subreddit_name in watcher config dict for lookup - Add source_metadata column to DBWatcherEvent to prevent data loss - Add defensive None check for PubMed tool result https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

…ts 18-20) - Cross-subreddit reference detection with triple criteria (similarity + shared entities + actionability) - Entity extraction (PMIDs, gene names, compound IDs) for cross-reference matching - Deliberation differ for comparing syntheses over time - Outcome tracking system for real-world result recording - Agent calibration with accuracy, domain-specific metrics, and bias detection - External API with API key authentication for programmatic access - Export system (Markdown and JSON formats) - DB tables for cross-references and outcome reports - 41 new tests (539 total), all passing https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

- Register all Phase 3-5 route modules (memory, watcher, export, external, feedback) in FastAPI app — previously all these endpoints returned 404 - Fix self-referential comparison in calibration bias detection: compare domain accuracy against overall accuracy instead of itself https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

D1 - Containerization: - Multi-stage Dockerfile (Python deps, Node frontend, production image) - docker-compose.yml with app, PostgreSQL 16 + pgvector, Redis - docker-compose.dev.yml with hot-reload and dev tools - .dockerignore for clean builds D2 - CI/CD Pipeline: - GitHub Actions: ci.yml (lint + test + build), deploy.yml (GHCR push) - db-migration.yml for PR migration validation - Python 3.11/3.12 matrix, codecov integration D3 - Production Configuration: - Settings module with env var validation (database, LLM, embedding, memory, watchers, deployment) - Structured JSON logging for production, text for development - Request ID tracking, sensitive field redaction - Production and staging YAML config overrides - .env.example with documented variables D4 - Database Migrations: - Alembic setup with env.py and migration template - 4 migration files: baseline, Phase 3 memory, Phase 4 watchers, Phase 5 cross-refs - All migrations reversible with downgrade() D5 - Monitoring: - Prometheus metrics: deliberations, cost, memory retrieval, watchers, LLM usage - /api/metrics endpoint - docker-compose.monitoring.yml with Prometheus + Grafana - Pre-built Grafana dashboard 26 new tests (565 total), all passing. https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

- Fix 138 ruff lint errors: remove unused imports (F401), fix line length (E501), rename ambiguous variables (E741), prefix unused vars (F841) - Auto-format all 83 files with ruff format - Configure ruff rules in pyproject.toml: select E/F/I, ignore E402, per-file ignores for tests (F841) and forward refs (F821) - Add pre-commit hook: ruff check + ruff format --check + pytest (fast) - Add install-hooks.sh for easy setup All 565 tests pass. ruff check and ruff format --check both clean. https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

…led) https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

- Commit updated uv.lock (was stale, missing new dependencies from Phases 3-5 which caused --frozen sync failures in CI) - Use actions/setup-python@v5 for Python version matrix instead of passing python-version to setup-uv (unsupported parameter) - Add --frozen to all uv sync calls for reproducible CI builds - Remove Docker build job from CI (requires Docker daemon; deploy workflow handles real builds on tag push) - Add coverage.xml to .gitignore https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

Tests used absolute /home/user/Colloquip/ paths that only work locally. Replaced with paths computed relative to the project root using os.path.dirname(os.path.abspath(__file__)), which works on any CI runner. https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

- tests/TEST_STRATEGY.md: Test conventions, fixtures, patterns, and checklist for writing effective tests (reference before writing new tests) - tests/test_api_routes.py: 51 tests covering all route handlers (export, external, feedback, memory, watcher routes) with happy paths, validation, 404/400/503 error paths - tests/test_tools.py: 26 tests for citation_verifier, company_docs, web_search tools with mocked external APIs - tests/test_infrastructure.py: 18 tests for db/engine, display, CLI Key coverage improvements: export_routes: 20% → 95%+ external_routes: 51% → 95%+ feedback_routes: 62% → 95%+ memory_routes: 51% → 95%+ watcher_routes: 45% → 95% citation_verifier: 47% → 95% company_docs: 31% → 90% web_search: 43% → 95% db/engine: 38% → 100% https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

Replace the static string confidence_level with Beta distribution parameters (alpha, beta) on SynthesisMemory. Retrieval scoring becomes similarity * confidence * decay instead of similarity-only. Key changes: - SynthesisMemory carries confidence_alpha/confidence_beta fields initialized from synthesis metadata (high→3:1, moderate→2:1.5, low→1:2) - compute_confidence() returns clamped posterior mean [0.10, 0.95] - temporal_decay() applies exponential decay with 120-day half-life - composite_score() multiplies similarity × confidence × decay - Annotations auto-update confidence: confirmed +2α, correction +3β, outdated +2β, context no change - Outcome reports (confirmed/contradicted) update linked memory confidence - Retrieval logging records every memory retrieval with similarity, confidence, decay factor, and composite score for future calibration - DB schema, repository, and API responses updated for new fields - 47 new tests covering all Bayesian math, decay, scoring, annotation wiring, retrieval logging, and prompt formatting https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

Full frontend rebuild with TanStack Router/Query, Zustand, Tailwind v4, and shadcn-style primitives. Includes route structure for communities, threads, agents, memories, notifications, and settings. Migrates existing deliberation components with Tailwind restyling. Fixes Memory type to match Bayesian confidence model (confidence/alpha/beta fields, correct citations_used and confidence_level types). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rename App.css -> app.css to match the import in main.tsx on case-sensitive filesystems. Remove unused legacy index.css. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Wrap UI primitives (Button, Card, Badge, Dialog, Tooltip, Skeleton) with HeroUI v3 beta components for consistent styling and accessibility - Fix Dialog component: conditionally render Modal to prevent backdrop from blocking pointer events when closed - Dockerfile: add alembic/ and alembic.ini to production image for migrations - Dockerfile.dev: replace broad COPY with targeted copies, use uv pip install - docker-compose.yml: add start_period to app healthcheck - docker-compose.dev.yml: mount alembic directory for dev migrations - Rewrite web/README.md to document actual tech stack (HeroUI, TanStack, TailwindCSS v4, Zustand) replacing default Vite boilerplate Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The alembic package is in the db-pg optional dependency group, not db. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SQLite does not support ALTER-based create_unique_constraint. Move the three constraints (consensus_maps, subreddit_memberships, syntheses) into their respective create_table calls as sa.UniqueConstraint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude and others added 25 commits February 11, 2026 18:01

new plan

20510b0

Fix pre-commit hook: remove --timeout flag (pytest-timeout not instal…

038ef62

…led) https://claude.ai/code/session_017HcdLV3pMPN1s3iXESokNo

Fix CSS filename case for Linux Docker builds

62b75ce

Rename App.css -> app.css to match the import in main.tsx on case-sensitive filesystems. Remove unused legacy index.css. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix migration check CI: install db-pg extra for alembic

5abe720

The alembic package is in the db-pg optional dependency group, not db. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sunitj merged commit fd37797 into main Feb 13, 2026
9 checks passed

sunitj mentioned this pull request Apr 10, 2026

Add Phase 6: mission directives, per-agent budgets, autoresearch, approval queue, dashboards #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 1-2 Implementation: Multi-subreddit platform with agents, tools, and synthesis#2

Phase 1-2 Implementation: Multi-subreddit platform with agents, tools, and synthesis#2
sunitj merged 25 commits intomainfrom
claude/agent-social-platform-8YF93

sunitj commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sunitj commented Feb 13, 2026

Summary

Key Changes

Platform Architecture

Synthesis & Output

Tools & Evidence

Cost & Governance

Memory & Watchers (Phase 3-4 Foundation)

API & Infrastructure

Database & Deployment

Testing

Implementation Details

Files Added/Modified

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants