You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Decide the primary query surface exposed to LLM-driven consumers: the backend-generated UI agent, external AI agents (MCP clients), and SDK users. The underlying data layer (typed metadata tables + expressive REST DSL) is covered in #76. This issue is specifically about the surface shape that LLMs author queries against.
Context
The roadmap target for OSA's primary UI is an AI-driven UI: a backend LLM dynamically generates interfaces per convention, because scientific domains are too heterogeneous for hand-written templates. In every access path, the thing that writes the query is an LLM — either in-backend (for UI generation) or external (agent clients).
This means the query surface has to be:
Introspectable — the LLM discovers what's queryable without prior knowledge
Typed — the LLM gets schema-level feedback, not just runtime errors
Safe — malformed queries fail at the boundary, not in the database
#76 lands typed metadata tables and extends /discovery/* REST with compound filters + pgvector + full-text. On top of that, the question is whether REST + a JSON filter DSL is the right surface, or whether GraphQL / MCP wins for the LLM-author case.
Node.js sidecar introspects the PG schema, auto-generates a GraphQL API. FastAPI proxies /graphql with auth forwarding. Grafast plugins add pgvector, tsvector, pg_trgm operators.
Pros: SDL introspection is the best-developed typed-schema surface for LLMs; single-round-trip nested queries; plugin ecosystem; minimal hand-written filter code
Cons: second service to operate (Node.js runtime, deploy, monitor); auth proxy + pgSettings + RLS is a new auth system; watch-mode schema rebuild is a new failure mode; likely runs in parallel with /discovery/* rather than replacing it (two query surfaces)
Cost: ~1 week build, ongoing ops overhead
Option C: MCP server on top of /discovery/*
A Model Context Protocol server wraps the typed discovery endpoints as discrete tools (search_records, search_features, similarity_search, full_text_search). Agents see a typed tool list with JSON Schema parameters.
Pros: standardized tool-calling surface; rigid constraints (LLM can't write malformed queries because there's no query string); Anthropic + OpenAI + Cursor all speak MCP natively; cheap to build on top of the REST layer
Cons: less flexible than GraphQL for compound/nested queries; tool design decisions matter (how many tools? one mega-tool with a filter DSL? several focused tools?)
REST is the ground truth. MCP wraps it for agent clients. GraphQL (optional) adds SDL introspection if a consumer specifically needs it.
Pros: covers every access pattern
Cons: three surfaces to document and maintain
Open questions
Is the in-backend LLM (AI-driven UI generator) best served by REST or GraphQL? Measurable — build the UI-generation prompt both ways and compare success rate + tokens.
How do external agents actually want to consume OSA? If the answer is "as an MCP server in Claude Desktop / Cursor / OpenAI Apps," that strongly favors Option C.
Does GraphQL's single-round-trip nested query actually matter for scientific data, where the typical access pattern is filter → paginate → export to Parquet?
Summary
Decide the primary query surface exposed to LLM-driven consumers: the backend-generated UI agent, external AI agents (MCP clients), and SDK users. The underlying data layer (typed metadata tables + expressive REST DSL) is covered in #76. This issue is specifically about the surface shape that LLMs author queries against.
Context
The roadmap target for OSA's primary UI is an AI-driven UI: a backend LLM dynamically generates interfaces per convention, because scientific domains are too heterogeneous for hand-written templates. In every access path, the thing that writes the query is an LLM — either in-backend (for UI generation) or external (agent clients).
This means the query surface has to be:
#76 lands typed metadata tables and extends
/discovery/*REST with compound filters + pgvector + full-text. On top of that, the question is whether REST + a JSON filter DSL is the right surface, or whether GraphQL / MCP wins for the LLM-author case.Options
Option A: Typed REST + JSON filter DSL (what #76 ships)
Keep
/discovery/recordsand/discovery/features/{hook}. LLMs author JSON filter trees. OpenAPI spec provides the introspection surface.Option B: PostGraphile sidecar (original #76 proposal)
Node.js sidecar introspects the PG schema, auto-generates a GraphQL API. FastAPI proxies
/graphqlwith auth forwarding. Grafast plugins add pgvector, tsvector, pg_trgm operators.pgSettings+ RLS is a new auth system; watch-mode schema rebuild is a new failure mode; likely runs in parallel with/discovery/*rather than replacing it (two query surfaces)Option C: MCP server on top of
/discovery/*A Model Context Protocol server wraps the typed discovery endpoints as discrete tools (
search_records,search_features,similarity_search,full_text_search). Agents see a typed tool list with JSON Schema parameters.Option D: All of the above
REST is the ground truth. MCP wraps it for agent clients. GraphQL (optional) adds SDL introspection if a consumer specifically needs it.
Open questions
filter → paginate → export to Parquet?Decision path
Depends on
Related