feat(retrieval): CypherFirstAggregationStrategy + cypher-path accuracy fixes#255
feat(retrieval): CypherFirstAggregationStrategy + cypher-path accuracy fixes#255galshubeli wants to merge 11 commits into
Conversation
…ions
Six localized fixes to the cypher_generation pipeline, identified from a
failure-mode investigation on a 56-person/10-org synthetic corpus where
both vector-only and cypher-enabled retrieval were silently producing
wrong answers on counting, top-N, group-by, and intersection questions.
P0 fixes (ship together):
- Smart LIMIT injection. _sanitize_cypher no longer auto-injects LIMIT on
pure aggregations (count/sum/avg without group-by) and uses the new
_DEFAULT_ROW_LIMIT=100 constant otherwise. Paired with raising the
result_assembly slice cap to 100 plus a truncation sentinel, this stops
group-by lists ("orgs with >=5 employees", "top-N city") from being
silently cut at 20 rows.
- Authoritative result framing. The cypher results section is renamed to
"Authoritative Graph Query Results (deterministic; trust over passages
on counts and aggregates)" and a matching rule 8 is added to both
_RAG_SYSTEM_PROMPT variants in api/main.py. Together they stop the LLM
from contradicting a correct numeric cypher answer when verbose passage
text mentions a different entity.
- APOC/GDS/db function blocklist. validate_cypher now rejects dotted-
namespace function calls (apoc.text.regexGroups, gds.*, db.*) which
FalkorDB silently returns 0 rows for. The error feeds the existing
retry-with-feedback loop so attempt 2 has a concrete fix-it.
P1 fixes:
- 0-row label-widen fallback. When a typed-label cypher returns 0 rows
AND a name predicate is present (label is a routing hint, not the
filter itself), execute_cypher_retrieval rewrites typed labels to
__Entity__ and re-runs once. Pure-cypher rewrite, no second LLM call.
Recovers cases where the extractor labelled an entity differently than
the schema prompt steered the LLM toward.
- Two new schema-prompt examples replacing redundant ones: a top-N
group-by ("city with most employees", explicit 2-hop) and a set-
intersection ("works at BOTH A and B") with a matching rule.
P2: cypher metadata (cypher_fallback, cypher_truncated, cypher_rows)
threaded through assemble_raw_result into RetrieverResult.metadata so
operators can monitor fallback firing rate.
Public API: execute_cypher_retrieval now returns a 3-tuple
(facts, entities, metadata) instead of (facts, entities). Internal —
only callers are multi_path.py and tests.
Verification on the 6-question matrix moved 2 questions from wrong to
correct (city group-by, observability existential via label widen) and
preserved the 4 that were already passing. Unit tests added for
_is_pure_aggregation, _split_top_level_commas, _widen_typed_labels,
_should_widen_labels, apoc rejection, and label-widen firing/gating.
49 cypher tests + full SDK suite (629) green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New retrieval strategy that routes quantitative and structural questions through a deterministic Cypher-first path while delegating non-aggregation questions to a fallback strategy (default MultiPathRetrieval). Safe to use as the top-level strategy on GraphRAG. Implements six mechanisms identified from a failure-mode investigation where aggregation answers were being silently corrupted by extraction noise, lossy result formatting, and LLM mistrust of bare cypher numbers: - Intent classifier routes per question into numeric_math / aggregation / rag. Catches "how many", "which X", "more X than Y", "BOTH A and B", "are there any", "average / total of NUMBER". - Multi-candidate cypher generation. K parallel samples per question, execute all, pick the highest-row-count result. Beats LLM stochasticity on structural interpretation without serial retries. - Column-named markdown table formatting via result.header. Eliminates the "10 | 7 | True" ambiguity that was swapping comparison answers. - Description+chunk-text fuzzy hybrid for "shared X" / "BOTH A and B" shapes when X is a free-text property (role, project) not extracted as a typed entity. Single batched cypher, sentence-restricted regex extraction, fuzzy intersect (substring or 2-token overlap). Recovers cases where graph extraction summarized away the project names. - Numeric-math sub-path. RETURN raw values, do average / sum / median in Python. Avoids LLM-arithmetic errors deterministically. - Negation-existential empty handling. For "are there any X without Y?" an empty cypher result is the definitive "No"; positive existentials fall back to vector retrieval since extraction labels are unreliable. The strategy emits its result under an "Authoritative Graph Query Results" section heading that rule 8 of the existing _RAG_SYSTEM_PROMPT (added in bb920c6) is already configured to trust on quantitative questions. Benchmark: prototype scored 7/7 stably across three runs on the seven- question failure-mode matrix that previous strategies hit 2-5/7 on. The SDK port scored 6/7 in an end-to-end check; the remaining failure was malformed extracted org names ("Glo" / "Initech System") leaking through into the answer — an extraction-quality issue orthogonal to this strategy. Public API: CypherFirstAggregationStrategy is exported at the top level and from graphrag_sdk.retrieval.strategies. Pass it as retrieval_strategy= on GraphRAG construction to opt in. Tests: 45 unit tests cover the pure-Python helpers (intent classifier, shape detectors, role/project regex extractors, fuzzy intersect, markdown table formatter, numeric coercion). Full SDK suite stays green at 674 passed, 3 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nStrategy Addresses concerns from a pre-merge review of 7fa7a1f. Behavior preserved on the failure-mode benchmark; the changes are about surface quality, observability, and test coverage. R1 — Make the cypher-authority rule opt-in. Rule 8 ("trust the 'Authoritative Graph Query Results' section over passages on counts and aggregates") used to live inside the base _RAG_SYSTEM_PROMPT and _RAG_SYSTEM_PROMPT_DELIMITED, so it fired on every completion() call SDK-wide — including users who never enable cypher retrieval. The rule is now extracted into a separate constant _CYPHER_AUTH_RULE and appended to the system prompt only when the retriever produced a cypher_results section (detected via item metadata or the canonical heading marker as a defensive fallback). Callers on MultiPathRetrieval without enable_cypher keep the unchanged 7-rule prompt. R2 — Add cypher_first_path metadata to every strategy result. The strategy has five sub-paths plus three RAG-fallback branches; today operators can't tell which one handled a query from the result alone. Each result now carries one of seven canonical PATH_* labels: numeric_math, shared_property_hybrid, cypher_table, negation_empty_no, rag_fallback, rag_fallback_numeric_fail, rag_fallback_cypher_empty. A shared _tag_path() helper handles the bookkeeping including the three delegated-fallback wrappings so the contract is uniform. R3 — Document prose-shape and graph-topology assumptions. The shared- property hybrid was tuned on graphs produced by the SDK's default GraphExtraction pipeline; custom extractors or domain prose may not match. The class docstring now has dedicated "Assumptions and known limits" and "Accuracy ceiling" sections naming each one. Plus a runtime warning fires when the batched (Org)<-[:RELATES]-(Person) query returns zero tuples, so operators on different schemas get a fast signal rather than silent wrong answers. R4 — Mocked end-to-end routing tests. The 45 existing unit tests cover pure-Python helpers, not the strategy's branching. Seven new tests use a mock LLM (returning canned cypher), a mock graph (returning canned result_sets), and a stub _FakeFallback to assert which sub-path fires for each intent + graph-state combination. Patterns covered: rag intent → fallback, aggregation + rows → cypher_table, numeric → python math, numeric extraction empty → fallback (with numeric_fail tag), negation + empty cypher → No (no delegation), positive-existential + empty cypher → fallback (with cypher_empty tag), topology-violation warning. R10 — ruff check pass. Dropped two unused imports (RetrieverError, ChatMessage) and sorted the test_cypher_first imports. All 75 facade + 57 cypher_first + 49 cypher_generation tests pass. Full SDK suite 686 passed, 3 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…extractor
R5 — Split CypherFirstAggregationStrategy into composable path classes.
The strategy file went from 944 lines with five sub-paths crammed into
the strategy class to a small dispatcher (~80 lines) backed by four
focused _AggregationPath subclasses:
- _RagDelegationPath — intent="rag" → fallback verbatim
- _NumericMathPath — intent="numeric_math" → Python arithmetic
- _SharedPropertyHybridPath — "BOTH A and B" / "same X as Z" via chunks
- _MultiCandidateCypherPath — K parallel cypher candidates + table
(also owns the negation-empty and cypher-empty-fallback branches)
Each path is a small class with a single `maybe_handle(query, ctx)`
method that returns either a final `RawSearchResult` or `None` to defer.
The strategy's `_execute` dispatches by intent and consults the relevant
paths in order. Existing routing tests (TestStrategyRouting, 7 cases)
keep covering the contract end-to-end; the helper classes are
implementation detail.
Pure refactor — no behaviour change. All paths share state via a single
reference to the parent strategy, so callers' constructor signature is
unchanged.
R8 — Pluggable phrase extractor for the shared-property hybrid.
The default role/project regexes target the prose patterns the SDK's
GraphExtraction pipeline produces ("works at X as a <role>", "contributes
to <project>") and a closed set of role-suffix words. Domain-specific
use cases (medical, legal, e-commerce, non-English) need different
vocabularies.
Added a `PhraseExtractor` ABC and a `DefaultPhraseExtractor` that wraps
the existing regexes. Strategies accept a `phrase_extractor=` parameter
on construction; the shared-property hybrid consults it instead of the
module-level `_extract_phrases`. Domain users can now subclass and
inject without forking.
Exports added at strategies/__init__.py and top-level graphrag_sdk
namespace.
Tests
-----
- 1 unit test for DefaultPhraseExtractor (matches the default regexes,
unknown kinds return empty set rather than raising).
- 1 mocked end-to-end test confirming that a custom extractor passed to
the strategy is consulted by the hybrid path (intersection computed
over the custom vocabulary, not the default).
All 763 unit tests pass; ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a "Recommended ingestion config" section to CypherFirstAggregationStrategy's docstring. The default chunker (FixedSizeChunking) splits on a character window with no sentence/paragraph awareness, so entity names get truncated at chunk boundaries — "Wayne Enterprises" becomes "Wayne En" in one chunk and "terprises..." in the next. The resulting stub entities never merge with their full forms during resolution, so cypher counts come back inflated and "which X" lists pick up phantoms. On the internal 7-question aggregation benchmark, switching from FixedSizeChunking(chunk_size=1000) to SentenceTokenCapChunking(max_tokens=512, overlap_sentences=2) moved the score from 6/7 (intermittent) to 7/7 (stable across three runs). The post-ingest graph state went from 11-14 organization nodes including "Glo" / "Initech System" / "Wayne En" stubs to exactly 10 clean orgs, and from 66-80 Person nodes (with Carla / Carla Okafor duplicates) to exactly 56 distinct persons. The short-form duplicate fix is a side benefit: sentence-aware chunks follow natural prose boundaries with overlap, so a person's first mention almost always lands in the same chunk as their later short-form references — per-chunk FastCoref then has the antecedent it needs. This docstring is the smallest useful change. A separate PR will propose changing the SDK default chunker; that's a larger conversation since it affects every existing caller. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughThis PR adds a Cypher-first retrieval strategy with intent detection and three aggregation paths (numeric-math, shared-property hybrid, multi-candidate table), updates Cypher generation (LIMIT injection, namespace validation, label-widening fallback), increases cypher result cap, threads cypher metadata through MultiPathRetrieval and result assembly, injects an authority prompt when cypher results are present, and adds tests. ChangesCypher-First Aggregation Strategy
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…tence-chunker docs(retrieval): recommend SentenceTokenCapChunking for CypherFirst
Naseem77
left a comment
There was a problem hiding this comment.
- [medium] graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py:222 Dotted-namespace blocklist matches inside string literals
- Problem: validate_cypher only strips comments (//, /*
*/) from cypher_norm — it doesn't strip string literals before running _DOTTED_FN_RE. A legitimate query like MATCH (n) WHERE n.text CONTAINS 'apoc.foo(' RETURN n would be rejected as using an APOC
function. - Impact: False-positive rejection forces the retry loop to regenerate (or, after exhausting retries, the strategy silently returns empty results) for queries whose string-literal payloads happen to contain
.(. The pattern is realistic for LLM-generated regex/text-search predicates. - Fix: Strip single/double-quoted string literals (with escape handling) from cypher_norm before applying _DOTTED_FN_RE, the same way comments are stripped. The existing \bCALL\b check has the same latent
issue but the new dotted-fn check is much more likely to hit it because dots inside quoted regexes are common.
- [medium] graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py:580 Shared-property hybrid runs an unbounded full-graph scan
- Problem: The batch_cypher MATCH (o:Organization)<-[:RELATES]-(p:Person) OPTIONAL MATCH (p)-[:MENTIONED_IN]->(c:Chunk) RETURN ... collect(DISTINCT c.text) AS chunks has no LIMIT and pulls every Person–Org
pair with all their chunk texts into memory per query. _sanitize_cypher is not applied (the query goes straight to graph_store.query_raw). - Impact: On any non-toy graph (thousands of persons × dozens of chunks each), this is a multi-MB result on every "both A and B" / "same role as" question. Latency and memory grow linearly with graph size,
with no cap. - Fix: Restrict the batch to the orgs named in the question (the regex already captures org_a/org_b / target before the call) and/or cap chunk text length and row count. The current code already discards
rows whose org name doesn't match — push that filter into the query.
Concatenate the two end-of-file blocks in tests/test_facade.py — the new TestCypherAuthorityRuleInjection class from this branch and the v1.1.0 test surface (path-as-id, update, delete_document, apply_changes, sync-wrapper signatures) from main. The blocks don't overlap; both are kept verbatim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The dotted-namespace blocklist (_DOTTED_FN_RE), \bCALL\b check, LOAD CSV
check, and write-keyword check all ran over the comment-stripped Cypher
without first stripping quoted string literals. A query like
MATCH (n) WHERE n.text CONTAINS 'apoc.foo(' RETURN n
was rejected as using an APOC function, forcing the retry-with-feedback
loop to regenerate (or silently return empty after retries exhausted).
LLM-generated CONTAINS / regex predicates often look like this, so the
false-positive rate is non-trivial.
_strip_string_literals replaces both single- and double-quoted literals
with empty quotes (honoring backslash escapes so embedded quotes don't
end the match early). All keyword/function pattern checks now run over
that ``cypher_code`` view; structural checks (start keyword, RETURN
presence, label existence) keep using ``cypher_norm`` since their
patterns can't legitimately occur inside string literals.
Reported by Naseem on #255.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The _SharedPropertyHybridPath batch query ran an unbounded full-graph
scan of (Organization)<-[:RELATES]-(Person) with every chunk text
collected per pair. On any non-toy graph, "both A and B" and "same X as
Z" questions returned multi-MB results per call, growing linearly with
graph size. The post-query Python filter (_gather) discarded most of it,
so almost all of the data was wasted.
Three scoping changes:
- For "both A and B": push toLower(o.name) CONTAINS toLower($org_a/b)
into the WHERE clause so the database doesn't even materialize rows
for the other orgs. The original regex (_BOTH_AB_RE) already
captures the two org names — this just propagates them into the
query rather than re-deriving them downstream.
- For "same X as Z": no org filter is applicable (we need every other
org to compare against the target), so the LIMIT and substring cap
below are the only guardrails.
- For both shapes: collect substring(c.text, 0, $max_chunk_len)
instead of full chunks (cap = 2000 chars; phrase extraction runs on
sentence fragments so the head of each chunk carries the signal),
and add LIMIT 500 as a safety net.
Reported by Naseem on #255.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py`:
- Around line 428-451: The widening logic in _should_widen_labels is too
permissive; change it to only allow widening when the name CONTAINS predicate
explicitly targets the same labeled variable and to disable widening for
multi-label, aggregate, group-by, or intersection-shaped queries. Concretely:
require a regex that captures a variable name from a pattern with a typed label
(use _TYPED_NODE_LABEL_RE to find the labeled variable) and then confirm a
corresponding "\bWHERE\b.*\b<var>\.[\w]+\b.*\bCONTAINS\b" match before returning
True; return False if the labeled node has multiple labels (e.g.,
":Label1:Label2" or other intersection syntax), if _is_pure_aggregation(cypher)
is true, or if the query contains GROUP BY or other aggregation keywords; keep
the fallback disabled unless the name predicate is pinned to the same variable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7d6e6ffb-bf7f-42d6-a0fa-871f35f05ccb
📒 Files selected for processing (7)
graphrag_sdk/src/graphrag_sdk/__init__.pygraphrag_sdk/src/graphrag_sdk/api/main.pygraphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.pygraphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.pygraphrag_sdk/tests/test_cypher_first.pygraphrag_sdk/tests/test_cypher_generation.pygraphrag_sdk/tests/test_facade.py
🚧 Files skipped from review as they are similar to previous changes (5)
- graphrag_sdk/src/graphrag_sdk/api/main.py
- graphrag_sdk/src/graphrag_sdk/init.py
- graphrag_sdk/tests/test_facade.py
- graphrag_sdk/tests/test_cypher_generation.py
- graphrag_sdk/tests/test_cypher_first.py
| def _should_widen_labels(cypher: str) -> bool: | ||
| """Gate for the 0-row label-widen fallback. | ||
|
|
||
| Skip widening when the typed label IS the filter — i.e. the RETURN | ||
| aggregates over a labeled variable AND the query has no ``WHERE … CONTAINS`` | ||
| name predicate. In that case the user is asking "how many Persons?" and | ||
| widening would change the semantics. Otherwise (typical case: typed label | ||
| present alongside a name predicate or non-aggregate RETURN), widen. | ||
|
|
||
| Conservative: when in doubt we skip the fallback rather than risk a | ||
| semantically-different query. | ||
| """ | ||
| if not _TYPED_NODE_LABEL_RE.search(cypher): | ||
| return False # nothing to widen | ||
| has_contains_filter = bool( | ||
| re.search(r"\bWHERE\b.*\bCONTAINS\b", cypher, re.IGNORECASE | re.DOTALL) | ||
| ) | ||
| if has_contains_filter: | ||
| return True | ||
| # No name filter — check whether the RETURN is an aggregate over a labeled | ||
| # variable. If so, widening turns "count Persons" into "count Entities". | ||
| if _is_pure_aggregation(cypher): | ||
| return False | ||
| return True |
There was a problem hiding this comment.
Narrow the label-widen fallback so it can't answer a different question.
This gate currently treats almost any non-trivial query as safe to widen, then rewrites every typed entity label to __Entity__. For the new top-N and BOTH examples, a 0-row retry can turn Person/Organization/Location constraints into a generic entity traversal and return plausible-but-wrong rows. Since cypher results are later framed as authoritative, that's a major correctness risk. Please only widen variables that are directly pinned by their own name CONTAINS predicate, or disable widening for multi-label / aggregate / group-by / intersection shapes.
Possible direction
def _should_widen_labels(cypher: str) -> bool:
- if not _TYPED_NODE_LABEL_RE.search(cypher):
- return False # nothing to widen
- has_contains_filter = bool(
- re.search(r"\bWHERE\b.*\bCONTAINS\b", cypher, re.IGNORECASE | re.DOTALL)
- )
- if has_contains_filter:
- return True
- # No name filter — check whether the RETURN is an aggregate over a labeled
- # variable. If so, widening turns "count Persons" into "count Entities".
- if _is_pure_aggregation(cypher):
- return False
- return True
+ typed_vars = {
+ match.group(1)
+ for match in re.finditer(
+ r"\(\s*(\w+)\s*:(" + "|".join(re.escape(l) for l in _ENTITY_LABELS) + r")\b",
+ cypher,
+ )
+ }
+ if not typed_vars:
+ return False
+
+ if _is_pure_aggregation(cypher):
+ return False
+
+ name_filtered_vars = set(
+ re.findall(r"\b(\w+)\.name\s+CONTAINS\b", cypher, re.IGNORECASE)
+ )
+ return typed_vars <= name_filtered_varsAlso applies to: 522-537
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py`
around lines 428 - 451, The widening logic in _should_widen_labels is too
permissive; change it to only allow widening when the name CONTAINS predicate
explicitly targets the same labeled variable and to disable widening for
multi-label, aggregate, group-by, or intersection-shaped queries. Concretely:
require a regex that captures a variable name from a pattern with a typed label
(use _TYPED_NODE_LABEL_RE to find the labeled variable) and then confirm a
corresponding "\bWHERE\b.*\b<var>\.[\w]+\b.*\bCONTAINS\b" match before returning
True; return False if the labeled node has multiple labels (e.g.,
":Label1:Label2" or other intersection syntax), if _is_pure_aggregation(cypher)
is true, or if the query contains GROUP BY or other aggregation keywords; keep
the fallback disabled unless the name predicate is pinned to the same variable.
| # Reject dotted-namespace function calls (apoc.*, gds.*, db.*). | ||
| # FalkorDB doesn't implement these plugins; the call silently returns 0 | ||
| # rows at execution. Surfacing it here lets the retry loop regenerate. | ||
| for ns, _ in _DOTTED_FN_RE.findall(cypher_code): |
There was a problem hiding this comment.
@Naseem77 — good catch on the literal-stripping gap. Fixed in 4c4f54c.
The validator now strips both single- and double-quoted Cypher string literals (with \-escape handling so 'apoc\'s foo(' doesn't terminate the literal early) before running the _DOTTED_FN_RE, \bCALL\b, LOAD CSV, and write-keyword checks. Structural checks that can't legitimately occur inside a string predicate (start-keyword allowlist, RETURN presence, label existence) still run on cypher_norm since they don't have the false-positive risk.
Regression test in test_cypher_generation.py::test_does_not_reject_blocklisted_tokens_inside_string_literals covers:
n.text CONTAINS 'apoc.foo('n.text CONTAINS "gds.shortest.path("n.title = 'CALL me maybe'- escape-honoring:
n.text = 'apoc\'s db.foo(' n.text CONTAINS 'LOAD CSV here'
| params["org_a"] = org_a | ||
| params["org_b"] = org_b | ||
|
|
||
| batch_cypher = ( |
There was a problem hiding this comment.
@Naseem77 — agreed, the unbounded scan needed bounds. Fixed in 652f2a4.
Three changes to the batch:
- For "both A and B": push
WHERE toLower(o.name) CONTAINS toLower($org_a) OR toLower(o.name) CONTAINS toLower($org_b)into the query._BOTH_AB_REalready captures both org names, so this just propagates them into Cypher rather than re-deriving them downstream in_gather. - For "same X as Z": no org filter is applicable (we need every other org to compare against the target), so the LIMIT + substring cap below are the only guardrails for this shape.
- For both shapes:
collect(DISTINCT substring(c.text, 0, $max_chunk_len))(cap = 2000 chars — phrase extraction runs on sentence fragments anyway, so the head of each chunk carries the signal) and aLIMIT 500row cap as a safety net.
The two caps live as module-level _HYBRID_BATCH_ROW_CAP / _HYBRID_CHUNK_TEXT_CAP constants with rationale in their docstring. Regression test test_hybrid_batch_query_is_scoped_and_capped asserts the WHERE / LIMIT / substring / params behavior for both shapes.
I didn't run _sanitize_cypher on the batch — it's an internal template, not LLM-generated, so it doesn't need the LLM-pattern cleanup. Happy to add it as belt-and-suspenders if you'd prefer.
ruff E741 on cypher_generation.py:417 was failing CI on the PR branch (``re.escape(l) for l in _ENTITY_LABELS``). The variable doesn't exist on main, so the prior lint-fix sweep (0597de9) couldn't catch it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI runs ``ruff format --check src/`` (added on main in 569d886, before this branch existed) and three files have been failing it: cypher_first.py, cypher_generation.py, result_assembly.py. Most of the diff is pre-existing wrapping (single-line-list-on-multiple-lines collapsed, ternary wrapping, blank-line spacing) — no semantic changes. ``ruff format src/`` clean; ``ruff check src/`` clean; full SDK suite unchanged (846 passed, 23 skipped). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Adds a new opt-in retrieval strategy,
CypherFirstAggregationStrategy, that routes quantitative/structural questions ("how many", "which X has the most", "BOTH A and B", "are there any X without Y", "what is the average …") through a deterministic Cypher-first path. Non-aggregation questions delegate toMultiPathRetrievalunchanged.Alongside the strategy, this PR also lands six targeted SDK-level fixes to the pre-existing cypher path (used by
MultiPathRetrieval(enable_cypher=True)too): smartLIMITinjection,APOC/GDS/db.*function-call blocklist in the validator, 0-row label-widen fallback, two new schema-prompt examples, and metadata threading for observability.Background
A focused failure-mode investigation on a 56-person / 10-org synthetic benchmark identified seven distinct failure modes on aggregation questions — none of them random one-offs:
\" | \"-joined strings lose column names, so the answer-LLM can swap row valuesLIMIT 25truncates group-by aggregations silentlyapoc.text.regexGroups(...)-style function calls slipped past theCALLblocklistThe new strategy directly addresses each one. The 7-question failure-mode benchmark went from 2-5/7 (depending on which pre-fix variant) to 7/7 stable across three runs.
What's in this PR (4 commits)
Each commit has its own detailed body.
Compatibility (cypher-off callers)
The PR is additive for users who don't opt in:
CypherFirstAggregationStrategyonly runs if passed explicitly asretrieval_strategy=enable_cypher=TrueTest coverage (`tests/test_facade.py::TestCypherAuthorityRuleInjection`) pins the contract.
Usage
```python
from graphrag_sdk import (
CypherFirstAggregationStrategy,
FastCorefResolver,
GraphExtraction,
LLMVerifiedResolution,
SentenceTokenCapChunking,
)
chunker = SentenceTokenCapChunking(max_tokens=512, overlap_sentences=2)
extractor = GraphExtraction(llm=llm, coref_resolver=FastCorefResolver())
resolver = LLMVerifiedResolution(llm=llm, embedder=embedder)
async with GraphRAG(
connection=conn, llm=llm, embedder=embedder, embedding_dimension=256,
) as rag:
await rag.ingest(text=doc, chunker=chunker,
extractor=extractor, resolver=resolver)
await rag.finalize()
rag._retrieval_strategy = CypherFirstAggregationStrategy(
graph_store=rag._graph_store,
vector_store=rag._vector_store,
embedder=embedder,
llm=llm,
)
answer = await rag.completion("Which city has the most employees?")
```
Related PRs
SentenceTokenCapChunkingthe default chunker (recommended pairing; the benchmark required it to hit 7/7)Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Tests