diff --git a/docs/proposals/05-semantic-plan-search-graph.md b/docs/proposals/05-semantic-plan-search-graph.md index 812ecfd1f..d0e1e8828 100644 --- a/docs/proposals/05-semantic-plan-search-graph.md +++ b/docs/proposals/05-semantic-plan-search-graph.md @@ -118,6 +118,8 @@ def index_plan(plan_id, title, prompt, summary): ### Search API +**NOTE:** This API is a proposed local feature, not part of the public MCP interface. Implementation details TBD. + ```http GET /api/plans/search Query Parameters: @@ -189,6 +191,7 @@ def generate_plan_with_examples(prompt): ### 3. Plan Recommendations ```jsx // After user completes a plan +// NOTE: Endpoint `/api/plans/{planId}/similar` is a proposed feature (TBD implementation) function RelatedPlans({ currentPlanId }) { const { data } = useSWR(`/api/plans/${currentPlanId}/similar?limit=5`); @@ -252,7 +255,7 @@ def trending_domains(days=30): ### Week 3: Search API -- Build `/api/plans/search` endpoint +- Build semantic search endpoint (TBD - local feature, not part of MCP) - Add filtering (domain, min_similarity) diff --git a/docs/proposals/07-elo-ranking.md b/docs/proposals/07-elo-ranking.md index ca2acc88a..f665f4e69 100644 --- a/docs/proposals/07-elo-ranking.md +++ b/docs/proposals/07-elo-ranking.md @@ -35,7 +35,9 @@ PlanExe ranks generated plans using a two‑phase LLM evaluation to avoid gaming - Corpus source: PlanExe‑web `_data/examples.yml` -## Endpoints +## Endpoints (Proposed Local Feature) + +**NOTE:** These endpoints are proposed for local/self-hosted PlanExe deployments. They are not part of the public MCP interface. Implementation TBD. - `POST /api/rank` → rank plan, update Elo @@ -510,7 +512,7 @@ X-API-Key: }, "budget_cents": 1500000000, "title": "Electric VTOL Development Program", - "url": "https://planexe.com/plans/abc123" + "url": "https://home.planexe.org/plans/abc123" } ``` diff --git a/docs/proposals/12-evidence-based-founder-execution-index.md b/docs/proposals/12-evidence-based-founder-execution-index.md index 4f5757c0a..a5b916a3c 100644 --- a/docs/proposals/12-evidence-based-founder-execution-index.md +++ b/docs/proposals/12-evidence-based-founder-execution-index.md @@ -52,7 +52,7 @@ In short: as AI can read and evaluate entire reports, the advantage of slide dec Example of a PlanExe report that an AI can evaluate end-to-end: -- https://planexe.org/20260114_cbc_validation_report.html +- https://docs.planexe.org/examples/20260114_cbc_validation_report.html (TBD - verify URL) This is the kind of artifact the FEI is designed to ingest and audit. If the numbers are fabricated or hallucinated, the FEI should penalize confidence and surface the missing verification. diff --git a/docs/proposals/44-investor-grade-audit-pack-generator.md b/docs/proposals/44-investor-grade-audit-pack-generator.md index a40859d6d..dfeeeceb7 100644 --- a/docs/proposals/44-investor-grade-audit-pack-generator.md +++ b/docs/proposals/44-investor-grade-audit-pack-generator.md @@ -124,7 +124,7 @@ Create a new audit pack. ```json { "pack_id": "pack_456", - "download_url": "https://planexe.org/api/audit/download/pack_456.pdf", + "download_url": "https://home.planexe.org/api/audit/download/pack_456.pdf", "expires_at": "2026-02-18T16:00:00Z" } ``` diff --git a/docs/proposals/48-moltbook-reputation-bridge.md b/docs/proposals/48-moltbook-reputation-bridge.md index f31b50f4e..6db649e95 100644 --- a/docs/proposals/48-moltbook-reputation-bridge.md +++ b/docs/proposals/48-moltbook-reputation-bridge.md @@ -55,7 +55,7 @@ Create a verifiable claim system where PlanExe acts as a reputation oracle: { "name": "Master Architect", "description": "Won 5+ Bids in 'Construction'", - "icon_url": "https://planexe.org/badges/architect_gold.svg", + "icon_url": "https://home.planexe.org/badges/architect_gold.svg", "issued_at": "2026-02-11" } ], diff --git a/docs/proposals/62-agent-first-frontend-discoverability.md b/docs/proposals/62-agent-first-frontend-discoverability.md index bc0340cd3..d9f5c76c8 100644 --- a/docs/proposals/62-agent-first-frontend-discoverability.md +++ b/docs/proposals/62-agent-first-frontend-discoverability.md @@ -34,7 +34,7 @@ This section records what is currently implemented in the repository and what st - MCP auth uses `X-API-Key` - Tool names match current MCP tools (`prompt_examples`, `task_create`, `task_status`, `task_stop`, `task_file_info`) - Pricing/cost docs point to `https://docs.planexe.org/costs_and_models/` - - Support contact includes Discord: `https://planexe.org/discord` + - Support contact includes Discord: `https://home.planexe.org/discord` - Current positioning is documented: - `home.planexe.org` is human-facing (account/billing/docs links) - `mcp.planexe.org` is the AI-facing API surface @@ -117,7 +117,7 @@ Recommended flow: - Docs: https://docs.planexe.org - GitHub issues: https://github.com/PlanExeOrg/PlanExe/issues -- Discord: https://planexe.org/discord +- Discord: https://home.planexe.org/discord ``` **Implementation:** @@ -132,6 +132,8 @@ Recommended flow: ### 2. OpenAPI / LLM-Optimized Tool Schema +**STATUS: NOT YET IMPLEMENTED** — This recommendation is aspirational. The `/openapi.json` endpoint does not currently exist on either domain. + **What:** Expose machine-readable schemas for PlanExe's AI-facing interfaces (MCP-first), and optionally OpenAPI for any public REST surfaces. **Why:** OpenAI's function-calling guide and Responses API expect structured function schemas. LLMs perform better with: @@ -272,6 +274,8 @@ Recommended flow: ### 3. MCP Well-Known Endpoint (`.well-known/mcp.json`) +**STATUS: NOT YET IMPLEMENTED** — This endpoint does not currently exist at `mcp.planexe.org/.well-known/mcp.json`. This is a future enhancement. + **What:** Add `/.well-known/mcp.json` describing PlanExe as an MCP server, compatible with OpenAI's MCP connector and orchestrators. **Why:** OpenAI's tools-connectors-mcp guide and orchestrator patterns show that MCP servers need a discoverable manifest listing available tools, auth requirements, and rate limits. diff --git a/docs/proposals/68-domain-aware-plan-ranking-engine.md b/docs/proposals/68-domain-aware-plan-ranking-engine.md new file mode 100644 index 000000000..7add2a8ba --- /dev/null +++ b/docs/proposals/68-domain-aware-plan-ranking-engine.md @@ -0,0 +1,298 @@ +--- +title: "Domain-Aware Plan Ranking Engine with Relative Comparison" +date: 2026-02-22 +status: proposal +author: Larry the Laptop Lobster +--- + +# Domain-Aware Plan Ranking Engine with Relative Comparison + +**Author:** Larry (via OpenClaw) +**Date:** 2026-02-22 +**Status:** Proposal +**Audience:** Technical reviewers, engineers, stakeholders + +--- + +## Problem Statement + +Current plan evaluation assumes a universal rubric (concreteness, executability, success criteria) with fixed weights. This breaks down when comparing plans from different domains: + +- A road construction plan has different success signals than a software project +- Domain-specific KPIs (e.g., budget contingency in construction vs. MVP launch timing in software) matter more than generic signals +- Absolute scoring ("this plan is a 6/10") doesn't tell us whether it's in the top 10% of *similar plans* or mediocre compared to its peers + +## Solution: Domain-Aware Relative Ranking + +Instead of absolute scores, rank plans **within their domain context** using: + +1. **Domain Classification** — detect plan type (construction, software, marketing, operations, etc.) +2. **Domain-Specific Signal Extraction** — pull KPIs relevant to that domain +3. **Corpus Bucketing** — group plans by type for fair comparison +4. **Relative ELO Ranking** — score each plan against similar plans, not in a vacuum +5. **Actionability** — surface top-performing plans (>90th percentile) as refinement candidates, flag major-rewrite situations + +## Architecture + +``` +Plan Input + ↓ +[Domain Classifier] → Detect plan type (construction, software, marketing, ops, etc.) + ↓ +[Domain-Specific Extractor] → Pull KPIs: timeline clarity, resource estimates, risk mitigations, owner assignment, etc. + ↓ +[Corpus Bucketer] → Find all similar-type plans in database + ↓ +[ELO Ranker] → Compare new plan against sampled corpus neighbors + ↓ +[Actionability Scorer] → Is this top 10%? Fixable? Rewrite candidate? + ↓ +Output: Rank percentile, actionability flag, refinement recommendations +``` + +## Implementation Details + +### 1. Domain Classification + +**Input:** Plan text + metadata (e.g., project title, goals, phases) + +**Output:** Domain label (one of: construction, software, marketing, operations, research, business-development, other) + +**Method:** +- LLM-based classification (zero-shot with 1-2 examples per domain) +- Fallback: keyword matching on phase names, deliverables, team roles +- Confidence threshold: if <0.7, flag as "cross-domain" or "unclear" + +**Example prompts:** +``` +"Read this plan and classify it as one of: construction, software, marketing, operations, research, business-development, other. Explain your reasoning in 1 sentence." + +Plan text here... +``` + +### 2. Domain-Specific Signal Extraction + +Each domain extracts different KPIs: + +**Construction:** +- Budget vs. estimate variance tolerance +- Schedule float/slack (days available for delays) +- Risk contingency % of budget +- Owner accountability (named PM, not "TBD") +- Inspection/approval checkpoints + +**Software:** +- MVP vs. full feature clarity (what's launch, what's post-launch) +- Tech debt acknowledgment (testing, documentation standards) +- Team skill-market fit (do we have the right people?) +- Dependency clarity (external APIs, third-party risks) +- Launch/staging milestones + +**Marketing/Growth:** +- Channel diversification (not all eggs in one basket) +- CAC payback period or LTV:CAC ratio (are we thinking about unit economics?) +- Audience targeting specificity (who exactly, not "millennials") +- Content calendar or cadence clarity +- Success metric definition (viral coeff, NPS, growth rate?) + +**Operations:** +- Process automation KPI (% manual vs. automated workflows) +- SLA definition (response time, uptime targets) +- Escalation clarity (who handles edge cases?) +- Monitoring/alerting (do we know if something breaks?) + +**All domains:** +- Concreteness: timeline specificity, named owner, measurable KPIs (0–10) +- Executability: phase sequencing, dependencies clear (0–10) +- Success criteria: explicit win conditions, not vibes (0–10) + +**Output:** JSON with domain + extracted signals (each 0–10) + +```json +{ + "domain": "software", + "concreteness": 8, + "executability": 7, + "success_criteria": 6, + "domain_specific": { + "mvp_clarity": 9, + "tech_debt_acknowledged": 7, + "team_fit": 6, + "dependency_clarity": 8, + "launch_milestones": 9 + }, + "confidence": 0.92 +} +``` + +### 3. Corpus Bucketing + +**Storage:** +- `plan_corpus` table extended with `domain` column +- Index on `domain` for fast filtering +- pgvector embeddings per domain (optional, for semantic search within domain) + +**Query:** +```sql +SELECT * FROM plan_corpus +WHERE domain = 'software' +ORDER BY created_at DESC +LIMIT 1000; +``` + +**Bucketing strategy:** +- Exact domain match (software vs. software) +- Fuzzy fallback: if bucket size <20, blend adjacent domains (e.g., "software" + "research" for AI projects) + +### 4. Relative ELO Ranking + +**Algorithm:** + +1. Extract new plan's signals → `new_plan_vector` +2. Sample 5–10 existing plans from same domain (random + stratified by existing Elo) +3. For each sampled plan, LLM pairwise comparison: + ``` + "Plan A: [concreteness, executability, success clarity] + Plan B: [concreteness, executability, success clarity] + + Which plan is more likely to succeed? Why? + (Likert: strongly A, slightly A, neutral, slightly B, strongly B)" + ``` +4. Use Likert output to compute win/loss, then update Elo: + ``` + new_elo = old_elo + K * (expected_win - actual_win) + where K = 32 (standard), expected_win based on current Elos + ``` + +**Result:** Each plan has an Elo score *within its domain*, comparable across similar plans. + +### 5. Actionability Scoring + +**Output:** + +```json +{ + "plan_id": "...", + "domain": "software", + "elo_score": 1650, + "percentile": 0.87, + "actionability": { + "is_candidate_for_refinement": true, + "reason": "87th percentile; fixable with clarity on tech debt and team roles", + "needs_major_rewrite": false, + "top_gaps": ["tech_debt_acknowledged", "team_fit"], + "confidence": 0.92 + } +} +``` + +**Decision rules:** +- **>90th percentile**: "High-performing; consider as template for other plans" +- **70–90th percentile**: "Good candidate for refinement; address top gaps" +- **50–70th percentile**: "Mid-tier; incremental improvements or focused refinement" +- **<50th percentile & low concreteness**: "Major rewrite recommended; start over with domain-specific template" +- **<50th percentile & high concreteness**: "Execution challenges; may be doable despite lower score" + +### 6. API Endpoints + +``` +POST /api/rank/domain-aware + Input: { plan_text, plan_metadata } + Output: { domain, signals, elo, percentile, actionability } + +GET /api/leaderboard/by-domain?domain=software&limit=20 + Output: [ { rank, plan_id, elo, percentile }, ... ] + +GET /api/corpus-stats?domain=software + Output: { domain, count, avg_elo, elo_stdev, domain_signals_info } +``` + +## Data Model + +**New columns in `plan_corpus`:** +```sql +ALTER TABLE plan_corpus ADD COLUMN domain VARCHAR(50); +ALTER TABLE plan_corpus ADD COLUMN signals JSONB; -- domain-agnostic + domain-specific +ALTER TABLE plan_corpus ADD COLUMN elo_score FLOAT DEFAULT 1600; +ALTER TABLE plan_corpus ADD COLUMN percentile FLOAT; -- recomputed periodically +ALTER TABLE plan_corpus ADD COLUMN actionability_notes JSONB; + +CREATE INDEX idx_plan_corpus_domain ON plan_corpus(domain); +CREATE INDEX idx_plan_corpus_elo ON plan_corpus(elo_score DESC); +``` + +## Implementation Phases + +### Phase 1: Minimal Domain Classifier (2 days) +- LLM-based domain detection (zero-shot) +- Fallback to keyword matching +- No ELO yet; just label & store domain + +### Phase 2: Domain-Specific Extractors (3 days) +- Build 3–4 domain-specific signal extractors (software, construction, ops, marketing) +- Normalize all to 0–10 scale +- Store signals in `plan_corpus` + +### Phase 3: ELO Ranking Engine (4 days) +- Implement pairwise LLM comparison +- Elo update logic +- Corpus bucketing & sampling +- Percentile calculation + +### Phase 4: Actionability & API (2 days) +- Actionability scoring rules +- `/api/rank/domain-aware` endpoint +- `/api/leaderboard/by-domain` endpoint +- Test against real PlanExe corpus + +**Total estimate:** 10 days + +## Testing Strategy + +1. **Unit tests:** Domain classifier (accuracy on known plans) +2. **Integration tests:** Full ranking pipeline (new plan → elo score → percentile) +3. **Corpus validation:** Run against existing 100+ plans, verify percentile distribution is sensible +4. **Domain coverage:** Ensure top domains (software, construction, marketing) have >50 plans in corpus for ranking + +## Success Criteria + +- ✅ Domain classifier achieves >85% accuracy on test set +- ✅ Elo scores converge within 10 iterations (stability) +- ✅ Top 10% plans are consistently high-clarity and actionable +- ✅ Cross-domain comparison is avoided (software vs. construction ranked separately) +- ✅ API latency <2 seconds for new plan ranking + +## Open Questions + +1. **Domain list finality:** Should we start with 5 domains or leave it extensible? (Proposal: 5 initial, extensible) +2. **Sampling strategy:** Random 5–10 neighbors or stratified by existing Elo? (Proposal: stratified) +3. **Elo K-factor:** 32 (soft), 16 (hard), 64 (volatile)? (Proposal: 32, adaptive if needed) +4. **Corpus size threshold:** When do we stop ranking due to insufficient peers? (Proposal: <20 → merge adjacent domains) +5. **Actionability UI:** Does PlanExe web show percentile badges, heat maps, or refinement prompts? (Proposal: all three) + +## Dependencies + +- **LLM:** Gemini 2.0 Flash (for domain classification and pairwise comparison) +- **Embeddings:** OpenAI embeddings (optional, for semantic bucketing within domain) +- **Database:** PostgreSQL with pgvector (for corpus storage and fast domain filtering) +- **Rate limiting:** Respect API quotas (5 req/min per key) + +## Risks & Mitigation + +| Risk | Mitigation | +|------|-----------| +| Domain classifier misclassifies plan | Confidence threshold; manual override option; log misclassifications | +| Elo ranking is slow with large corpus | Cache pairwise comparisons; use stratified sampling (not random) | +| Cross-domain contamination | Strict bucketing; log when fallback to adjacent domain | +| Signal extraction is too generic | Domain-specific extractors with explicit KPI lists; tune per domain | + +## References + +- [ELO Ranking Proposal](07-elo-ranking.md) +- [Semantic Plan Search](05-semantic-plan-search-graph.md) + +--- + +## Changelog + +- **2026-02-22:** Initial proposal by Larry