feat(retrieval): CypherFirstAggregationStrategy + cypher-path accuracy fixes by galshubeli · Pull Request #255 · FalkorDB/GraphRAG-SDK

galshubeli · 2026-05-14T14:16:32Z

Summary

Adds a new opt-in retrieval strategy, CypherFirstAggregationStrategy, that routes quantitative/structural questions ("how many", "which X has the most", "BOTH A and B", "are there any X without Y", "what is the average …") through a deterministic Cypher-first path. Non-aggregation questions delegate to MultiPathRetrieval unchanged.

Alongside the strategy, this PR also lands six targeted SDK-level fixes to the pre-existing cypher path (used by MultiPathRetrieval(enable_cypher=True) too): smart LIMIT injection, APOC/GDS/db.* function-call blocklist in the validator, 0-row label-widen fallback, two new schema-prompt examples, and metadata threading for observability.

Background

A focused failure-mode investigation on a 56-person / 10-org synthetic benchmark identified seven distinct failure modes on aggregation questions — none of them random one-offs:

Cypher results encoded as \" | \"-joined strings lose column names, so the answer-LLM can swap row values
Auto LIMIT 25 truncates group-by aggregations silently
apoc.text.regexGroups(...)-style function calls slipped past the CALL blocklist
Typed-label cypher queries fail when extraction labeled the entity differently
Schema prompt had no examples for "BOTH X AND Y" or top-N group-by
Empty cypher results were treated identically for positive and negation existentials
Roles/projects extracted as free text in chunks weren't queryable as typed nodes

The new strategy directly addresses each one. The 7-question failure-mode benchmark went from 2-5/7 (depending on which pre-fix variant) to 7/7 stable across three runs.

What's in this PR (4 commits)

Commit	Title
`bb920c6`	improve text-to-Cypher accuracy on aggregation questions
`7fa7a1f`	add CypherFirstAggregationStrategy
`deaa64d`	pre-merge review fixes for CypherFirstAggregationStrategy
`d818eb1`	split CypherFirst into path classes + pluggable extractor

Each commit has its own detailed body.

Compatibility (cypher-off callers)

The PR is additive for users who don't opt in:

CypherFirstAggregationStrategy only runs if passed explicitly as retrieval_strategy=
The cypher-path improvements (smart LIMIT, APOC blocklist, etc.) live inside the cypher generation/execution code and only run when enable_cypher=True
The "Authoritative Graph Query Results" system-prompt rule is added conditionally — only when the retriever produced a cypher_results section. With cypher off, the base system prompt is byte-identical to main.

Test coverage (`tests/test_facade.py::TestCypherAuthorityRuleInjection`) pins the contract.

Usage

```python
from graphrag_sdk import (
CypherFirstAggregationStrategy,
FastCorefResolver,
GraphExtraction,
LLMVerifiedResolution,
SentenceTokenCapChunking,
)

chunker = SentenceTokenCapChunking(max_tokens=512, overlap_sentences=2)
extractor = GraphExtraction(llm=llm, coref_resolver=FastCorefResolver())
resolver = LLMVerifiedResolution(llm=llm, embedder=embedder)

async with GraphRAG(
connection=conn, llm=llm, embedder=embedder, embedding_dimension=256,
) as rag:
await rag.ingest(text=doc, chunker=chunker,
extractor=extractor, resolver=resolver)
await rag.finalize()
rag._retrieval_strategy = CypherFirstAggregationStrategy(
graph_store=rag._graph_store,
vector_store=rag._vector_store,
embedder=embedder,
llm=llm,
)
answer = await rag.completion("Which city has the most employees?")
```

Related PRs

feat(ingestion): default to SentenceTokenCapChunking in ingest()/upda… #254 — independent change to make SentenceTokenCapChunking the default chunker (recommended pairing; the benchmark required it to hit 7/7)
docs(retrieval): recommend SentenceTokenCapChunking for CypherFirst #253 — currently mis-targeted, will be retargeted to stack on top of this PR so its diff shrinks to the 33-line docstring follow-up

Test plan

Unit tests: `tests/test_cypher_first.py` (66 tests), `tests/test_cypher_generation.py` (49 tests), `tests/test_facade.py` (75 tests including R1 contract)
End-to-end aggregation benchmark: 7/7 stable across 3 runs (`CypherFirstAggregationStrategy`) and 7/7 single run (`MultiPathRetrieval(enable_cypher=True)`)
Full SDK suite: 748 passed, 24 skipped on the cypher branch tip
`ruff check` clean on all touched files
Reviewer to confirm CI green on this PR

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Cypher-first retrieval for aggregation/numeric queries
- Pluggable phrase-extraction interface with a default extractor
- Exposed new aggregation and phrase-extractor classes in the public API
- Markdown-table formatting and numeric-coercion utilities
Improvements
- System prompt addendum when authoritative graph-query results are present
- Stricter query validation, default LIMIT injection, and label‑widening retry on empty results
- Increased graph-result row cap and richer cypher-related metadata
Tests
- Extensive tests covering routing, generation, formatting, coercion, and authority-rule behavior

…ions Six localized fixes to the cypher_generation pipeline, identified from a failure-mode investigation on a 56-person/10-org synthetic corpus where both vector-only and cypher-enabled retrieval were silently producing wrong answers on counting, top-N, group-by, and intersection questions. P0 fixes (ship together): - Smart LIMIT injection. _sanitize_cypher no longer auto-injects LIMIT on pure aggregations (count/sum/avg without group-by) and uses the new _DEFAULT_ROW_LIMIT=100 constant otherwise. Paired with raising the result_assembly slice cap to 100 plus a truncation sentinel, this stops group-by lists ("orgs with >=5 employees", "top-N city") from being silently cut at 20 rows. - Authoritative result framing. The cypher results section is renamed to "Authoritative Graph Query Results (deterministic; trust over passages on counts and aggregates)" and a matching rule 8 is added to both _RAG_SYSTEM_PROMPT variants in api/main.py. Together they stop the LLM from contradicting a correct numeric cypher answer when verbose passage text mentions a different entity. - APOC/GDS/db function blocklist. validate_cypher now rejects dotted- namespace function calls (apoc.text.regexGroups, gds.*, db.*) which FalkorDB silently returns 0 rows for. The error feeds the existing retry-with-feedback loop so attempt 2 has a concrete fix-it. P1 fixes: - 0-row label-widen fallback. When a typed-label cypher returns 0 rows AND a name predicate is present (label is a routing hint, not the filter itself), execute_cypher_retrieval rewrites typed labels to __Entity__ and re-runs once. Pure-cypher rewrite, no second LLM call. Recovers cases where the extractor labelled an entity differently than the schema prompt steered the LLM toward. - Two new schema-prompt examples replacing redundant ones: a top-N group-by ("city with most employees", explicit 2-hop) and a set- intersection ("works at BOTH A and B") with a matching rule. P2: cypher metadata (cypher_fallback, cypher_truncated, cypher_rows) threaded through assemble_raw_result into RetrieverResult.metadata so operators can monitor fallback firing rate. Public API: execute_cypher_retrieval now returns a 3-tuple (facts, entities, metadata) instead of (facts, entities). Internal — only callers are multi_path.py and tests. Verification on the 6-question matrix moved 2 questions from wrong to correct (city group-by, observability existential via label widen) and preserved the 4 that were already passing. Unit tests added for _is_pure_aggregation, _split_top_level_commas, _widen_typed_labels, _should_widen_labels, apoc rejection, and label-widen firing/gating. 49 cypher tests + full SDK suite (629) green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New retrieval strategy that routes quantitative and structural questions through a deterministic Cypher-first path while delegating non-aggregation questions to a fallback strategy (default MultiPathRetrieval). Safe to use as the top-level strategy on GraphRAG. Implements six mechanisms identified from a failure-mode investigation where aggregation answers were being silently corrupted by extraction noise, lossy result formatting, and LLM mistrust of bare cypher numbers: - Intent classifier routes per question into numeric_math / aggregation / rag. Catches "how many", "which X", "more X than Y", "BOTH A and B", "are there any", "average / total of NUMBER". - Multi-candidate cypher generation. K parallel samples per question, execute all, pick the highest-row-count result. Beats LLM stochasticity on structural interpretation without serial retries. - Column-named markdown table formatting via result.header. Eliminates the "10 | 7 | True" ambiguity that was swapping comparison answers. - Description+chunk-text fuzzy hybrid for "shared X" / "BOTH A and B" shapes when X is a free-text property (role, project) not extracted as a typed entity. Single batched cypher, sentence-restricted regex extraction, fuzzy intersect (substring or 2-token overlap). Recovers cases where graph extraction summarized away the project names. - Numeric-math sub-path. RETURN raw values, do average / sum / median in Python. Avoids LLM-arithmetic errors deterministically. - Negation-existential empty handling. For "are there any X without Y?" an empty cypher result is the definitive "No"; positive existentials fall back to vector retrieval since extraction labels are unreliable. The strategy emits its result under an "Authoritative Graph Query Results" section heading that rule 8 of the existing _RAG_SYSTEM_PROMPT (added in bb920c6) is already configured to trust on quantitative questions. Benchmark: prototype scored 7/7 stably across three runs on the seven- question failure-mode matrix that previous strategies hit 2-5/7 on. The SDK port scored 6/7 in an end-to-end check; the remaining failure was malformed extracted org names ("Glo" / "Initech System") leaking through into the answer — an extraction-quality issue orthogonal to this strategy. Public API: CypherFirstAggregationStrategy is exported at the top level and from graphrag_sdk.retrieval.strategies. Pass it as retrieval_strategy= on GraphRAG construction to opt in. Tests: 45 unit tests cover the pure-Python helpers (intent classifier, shape detectors, role/project regex extractors, fuzzy intersect, markdown table formatter, numeric coercion). Full SDK suite stays green at 674 passed, 3 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nStrategy Addresses concerns from a pre-merge review of 7fa7a1f. Behavior preserved on the failure-mode benchmark; the changes are about surface quality, observability, and test coverage. R1 — Make the cypher-authority rule opt-in. Rule 8 ("trust the 'Authoritative Graph Query Results' section over passages on counts and aggregates") used to live inside the base _RAG_SYSTEM_PROMPT and _RAG_SYSTEM_PROMPT_DELIMITED, so it fired on every completion() call SDK-wide — including users who never enable cypher retrieval. The rule is now extracted into a separate constant _CYPHER_AUTH_RULE and appended to the system prompt only when the retriever produced a cypher_results section (detected via item metadata or the canonical heading marker as a defensive fallback). Callers on MultiPathRetrieval without enable_cypher keep the unchanged 7-rule prompt. R2 — Add cypher_first_path metadata to every strategy result. The strategy has five sub-paths plus three RAG-fallback branches; today operators can't tell which one handled a query from the result alone. Each result now carries one of seven canonical PATH_* labels: numeric_math, shared_property_hybrid, cypher_table, negation_empty_no, rag_fallback, rag_fallback_numeric_fail, rag_fallback_cypher_empty. A shared _tag_path() helper handles the bookkeeping including the three delegated-fallback wrappings so the contract is uniform. R3 — Document prose-shape and graph-topology assumptions. The shared- property hybrid was tuned on graphs produced by the SDK's default GraphExtraction pipeline; custom extractors or domain prose may not match. The class docstring now has dedicated "Assumptions and known limits" and "Accuracy ceiling" sections naming each one. Plus a runtime warning fires when the batched (Org)<-[:RELATES]-(Person) query returns zero tuples, so operators on different schemas get a fast signal rather than silent wrong answers. R4 — Mocked end-to-end routing tests. The 45 existing unit tests cover pure-Python helpers, not the strategy's branching. Seven new tests use a mock LLM (returning canned cypher), a mock graph (returning canned result_sets), and a stub _FakeFallback to assert which sub-path fires for each intent + graph-state combination. Patterns covered: rag intent → fallback, aggregation + rows → cypher_table, numeric → python math, numeric extraction empty → fallback (with numeric_fail tag), negation + empty cypher → No (no delegation), positive-existential + empty cypher → fallback (with cypher_empty tag), topology-violation warning. R10 — ruff check pass. Dropped two unused imports (RetrieverError, ChatMessage) and sorted the test_cypher_first imports. All 75 facade + 57 cypher_first + 49 cypher_generation tests pass. Full SDK suite 686 passed, 3 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…extractor R5 — Split CypherFirstAggregationStrategy into composable path classes. The strategy file went from 944 lines with five sub-paths crammed into the strategy class to a small dispatcher (~80 lines) backed by four focused _AggregationPath subclasses: - _RagDelegationPath — intent="rag" → fallback verbatim - _NumericMathPath — intent="numeric_math" → Python arithmetic - _SharedPropertyHybridPath — "BOTH A and B" / "same X as Z" via chunks - _MultiCandidateCypherPath — K parallel cypher candidates + table (also owns the negation-empty and cypher-empty-fallback branches) Each path is a small class with a single `maybe_handle(query, ctx)` method that returns either a final `RawSearchResult` or `None` to defer. The strategy's `_execute` dispatches by intent and consults the relevant paths in order. Existing routing tests (TestStrategyRouting, 7 cases) keep covering the contract end-to-end; the helper classes are implementation detail. Pure refactor — no behaviour change. All paths share state via a single reference to the parent strategy, so callers' constructor signature is unchanged. R8 — Pluggable phrase extractor for the shared-property hybrid. The default role/project regexes target the prose patterns the SDK's GraphExtraction pipeline produces ("works at X as a <role>", "contributes to <project>") and a closed set of role-suffix words. Domain-specific use cases (medical, legal, e-commerce, non-English) need different vocabularies. Added a `PhraseExtractor` ABC and a `DefaultPhraseExtractor` that wraps the existing regexes. Strategies accept a `phrase_extractor=` parameter on construction; the shared-property hybrid consults it instead of the module-level `_extract_phrases`. Domain users can now subclass and inject without forking. Exports added at strategies/__init__.py and top-level graphrag_sdk namespace. Tests ----- - 1 unit test for DefaultPhraseExtractor (matches the default regexes, unknown kinds return empty set rather than raising). - 1 mocked end-to-end test confirming that a custom extractor passed to the strategy is consulted by the hybrid path (intersection computed over the custom vocabulary, not the default). All 763 unit tests pass; ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a "Recommended ingestion config" section to CypherFirstAggregationStrategy's docstring. The default chunker (FixedSizeChunking) splits on a character window with no sentence/paragraph awareness, so entity names get truncated at chunk boundaries — "Wayne Enterprises" becomes "Wayne En" in one chunk and "terprises..." in the next. The resulting stub entities never merge with their full forms during resolution, so cypher counts come back inflated and "which X" lists pick up phantoms. On the internal 7-question aggregation benchmark, switching from FixedSizeChunking(chunk_size=1000) to SentenceTokenCapChunking(max_tokens=512, overlap_sentences=2) moved the score from 6/7 (intermittent) to 7/7 (stable across three runs). The post-ingest graph state went from 11-14 organization nodes including "Glo" / "Initech System" / "Wayne En" stubs to exactly 10 clean orgs, and from 66-80 Person nodes (with Carla / Carla Okafor duplicates) to exactly 56 distinct persons. The short-form duplicate fix is a side benefit: sentence-aware chunks follow natural prose boundaries with overlap, so a person's first mention almost always lands in the same chunk as their later short-form references — per-chunk FastCoref then has the antecedent it needs. This docstring is the smallest useful change. A separate PR will propose changing the SDK default chunker; that's a larger conversation since it affects every existing caller. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-14T14:20:36Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bb232e35-c5b4-4ab4-94d3-1171e61d69b5

📥 Commits

Reviewing files that changed from the base of the PR and between 7ba4848 and 98aa604.

📒 Files selected for processing (3)

graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/result_assembly.py

🚧 Files skipped from review as they are similar to previous changes (3)

graphrag_sdk/src/graphrag_sdk/retrieval/strategies/result_assembly.py
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py

📝 Walkthrough

Walkthrough

This PR adds a Cypher-first retrieval strategy with intent detection and three aggregation paths (numeric-math, shared-property hybrid, multi-candidate table), updates Cypher generation (LIMIT injection, namespace validation, label-widening fallback), increases cypher result cap, threads cypher metadata through MultiPathRetrieval and result assembly, injects an authority prompt when cypher results are present, and adds tests.

Changes

Cypher-First Aggregation Strategy

Layer / File(s)	Summary
Public API Exports and Module Wiring `graphrag_sdk/src/graphrag_sdk/__init__.py`, `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/__init__.py`	Expose `CypherFirstAggregationStrategy`, `DefaultPhraseExtractor`, and `PhraseExtractor` via package imports and `__all__`.
Cypher Authority Rule and Prompt Integration `graphrag_sdk/src/graphrag_sdk/api/main.py`	Add `_CYPHER_AUTH_RULE`, markers, `_has_authoritative_cypher_results`, and conditional prompt augmentation in `completion()`.
Cypher Generation Validation, LIMITs, and Label Widening `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py`	Add default row limit, detect pure-aggregation to skip LIMIT injection, reject dotted-namespace function calls, implement typed-label widening fallback, and return enriched metadata.
MultiPath Retrieval Metadata Threading `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/multi_path.py`	Accept 3-tuple cypher returns (facts, entities, metadata) and forward `cypher_metadata` into result assembly.
Result Assembly Cap Increase and Metadata Merging `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/result_assembly.py`	Introduce `_CYPHER_RESULT_CAP = 100`, produce "Authoritative Graph Query Results" section with truncation notice, and merge prefixed `cypher_` metadata into final result metadata.
Cypher-First Core: Intent, Extraction, Utilities `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py`	Add intent/shape detection, yes/no/which-list classifiers, property-kind inference, `PhraseExtractor` and `DefaultPhraseExtractor`, markdown-table formatting, tagging utilities, and numeric-coercion helper.
Numeric-Math Execution Path `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py` (Numeric path)	Generate/validate Cypher returning numeric cells, coerce numbers, compute aggregates in Python, and fallback to RAG when no numeric extraction.
Shared-Property Hybrid Execution Path `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py` (Hybrid path)	Run batched Cypher to fetch descriptions, extract role/project phrases via pluggable extractor, and compute fuzzy intersections/overlaps for "both/same" questions.
Multi-Candidate Cypher-Table Path `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py` (Multi-candidate path)	Generate K candidate Cypher queries, execute in parallel, rank by row count, format best result as markdown table, and handle negation-existential vs fallback.
CypherFirstAggregationStrategy Class `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py`	Wire embedder/LLM/fallback/phrase-extractor, build sub-paths, and dispatch in `_execute` by detected intent.
Formatting, Tagging, and Helpers `graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py`	Table rendering with header synthesis, null/list handling, truncation flag; `_tag_path` for `cypher_first_path`; `_coerce_number` helper.
Comprehensive Test Coverage `graphrag_sdk/tests/*`	Add tests for intent detection, shape classification, phrase extraction, fuzzy intersection, table formatting, numeric coercion, path tagging, end-to-end routing, cypher generation helpers, label-widening fallback, and authority-rule injection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

FalkorDB/GraphRAG-SDK#253: Overlapping work introducing and exposing the cypher_first strategy and cypher-authority prompt gating.

Suggested reviewers

Naseem77

Poem

🐰 A rabbit hops through cyphered trees tonight,
Detecting intent by moonbeam's light,
Numbers to Python, phrases to match,
Tables and fallbacks all neatly dispatched,
Authority whispered, results set right. 🌙

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: introducing CypherFirstAggregationStrategy and including cypher-path accuracy improvements, which directly reflects the primary features in the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/cypher-aggregation-accuracy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…tence-chunker docs(retrieval): recommend SentenceTokenCapChunking for CypherFirst

Naseem77

[medium] graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py:222 Dotted-namespace blocklist matches inside string literals

Problem: validate_cypher only strips comments (//, /*
*/) from cypher_norm — it doesn't strip string literals before running _DOTTED_FN_RE. A legitimate query like MATCH (n) WHERE n.text CONTAINS 'apoc.foo(' RETURN n would be rejected as using an APOC
function.
Impact: False-positive rejection forces the retry loop to regenerate (or, after exhausting retries, the strategy silently returns empty results) for queries whose string-literal payloads happen to contain
.(. The pattern is realistic for LLM-generated regex/text-search predicates.
Fix: Strip single/double-quoted string literals (with escape handling) from cypher_norm before applying _DOTTED_FN_RE, the same way comments are stripped. The existing \bCALL\b check has the same latent
issue but the new dotted-fn check is much more likely to hit it because dots inside quoted regexes are common.

[medium] graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py:580 Shared-property hybrid runs an unbounded full-graph scan

Problem: The batch_cypher MATCH (o:Organization)<-[:RELATES]-(p:Person) OPTIONAL MATCH (p)-[:MENTIONED_IN]->(c:Chunk) RETURN ... collect(DISTINCT c.text) AS chunks has no LIMIT and pulls every Person–Org
pair with all their chunk texts into memory per query. _sanitize_cypher is not applied (the query goes straight to graph_store.query_raw).
Impact: On any non-toy graph (thousands of persons × dozens of chunks each), this is a multi-MB result on every "both A and B" / "same role as" question. Latency and memory grow linearly with graph size,
with no cap.
Fix: Restrict the batch to the orgs named in the question (the regex already captures org_a/org_b / target before the call) and/or cap chunk text length and row count. The current code already discards
rows whose org name doesn't match — push that filter into the query.

Concatenate the two end-of-file blocks in tests/test_facade.py — the new TestCypherAuthorityRuleInjection class from this branch and the v1.1.0 test surface (path-as-id, update, delete_document, apply_changes, sync-wrapper signatures) from main. The blocks don't overlap; both are kept verbatim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The dotted-namespace blocklist (_DOTTED_FN_RE), \bCALL\b check, LOAD CSV check, and write-keyword check all ran over the comment-stripped Cypher without first stripping quoted string literals. A query like MATCH (n) WHERE n.text CONTAINS 'apoc.foo(' RETURN n was rejected as using an APOC function, forcing the retry-with-feedback loop to regenerate (or silently return empty after retries exhausted). LLM-generated CONTAINS / regex predicates often look like this, so the false-positive rate is non-trivial. _strip_string_literals replaces both single- and double-quoted literals with empty quotes (honoring backslash escapes so embedded quotes don't end the match early). All keyword/function pattern checks now run over that ``cypher_code`` view; structural checks (start keyword, RETURN presence, label existence) keep using ``cypher_norm`` since their patterns can't legitimately occur inside string literals. Reported by Naseem on #255. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The _SharedPropertyHybridPath batch query ran an unbounded full-graph scan of (Organization)<-[:RELATES]-(Person) with every chunk text collected per pair. On any non-toy graph, "both A and B" and "same X as Z" questions returned multi-MB results per call, growing linearly with graph size. The post-query Python filter (_gather) discarded most of it, so almost all of the data was wasted. Three scoping changes: - For "both A and B": push toLower(o.name) CONTAINS toLower($org_a/b) into the WHERE clause so the database doesn't even materialize rows for the other orgs. The original regex (_BOTH_AB_RE) already captures the two org names — this just propagates them into the query rather than re-deriving them downstream. - For "same X as Z": no org filter is applicable (we need every other org to compare against the target), so the LIMIT and substring cap below are the only guardrails. - For both shapes: collect substring(c.text, 0, $max_chunk_len) instead of full chunks (cap = 2000 chars; phrase extraction runs on sentence fragments so the head of each chunk carries the signal), and add LIMIT 500 as a safety net. Reported by Naseem on #255. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py`:
- Around line 428-451: The widening logic in _should_widen_labels is too
permissive; change it to only allow widening when the name CONTAINS predicate
explicitly targets the same labeled variable and to disable widening for
multi-label, aggregate, group-by, or intersection-shaped queries. Concretely:
require a regex that captures a variable name from a pattern with a typed label
(use _TYPED_NODE_LABEL_RE to find the labeled variable) and then confirm a
corresponding "\bWHERE\b.*\b<var>\.[\w]+\b.*\bCONTAINS\b" match before returning
True; return False if the labeled node has multiple labels (e.g.,
":Label1:Label2" or other intersection syntax), if _is_pure_aggregation(cypher)
is true, or if the query contains GROUP BY or other aggregation keywords; keep
the fallback disabled unless the name predicate is pinned to the same variable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7d6e6ffb-bf7f-42d6-a0fa-871f35f05ccb

📥 Commits

Reviewing files that changed from the base of the PR and between ef42a09 and 652f2a4.

📒 Files selected for processing (7)

graphrag_sdk/src/graphrag_sdk/__init__.py
graphrag_sdk/src/graphrag_sdk/api/main.py
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py
graphrag_sdk/tests/test_cypher_first.py
graphrag_sdk/tests/test_cypher_generation.py
graphrag_sdk/tests/test_facade.py

🚧 Files skipped from review as they are similar to previous changes (5)

graphrag_sdk/src/graphrag_sdk/api/main.py
graphrag_sdk/src/graphrag_sdk/init.py
graphrag_sdk/tests/test_facade.py
graphrag_sdk/tests/test_cypher_generation.py
graphrag_sdk/tests/test_cypher_first.py

coderabbitai · 2026-05-18T13:31:25Z

+def _should_widen_labels(cypher: str) -> bool:
+    """Gate for the 0-row label-widen fallback.
+
+    Skip widening when the typed label IS the filter — i.e. the RETURN
+    aggregates over a labeled variable AND the query has no ``WHERE … CONTAINS``
+    name predicate. In that case the user is asking "how many Persons?" and
+    widening would change the semantics. Otherwise (typical case: typed label
+    present alongside a name predicate or non-aggregate RETURN), widen.
+
+    Conservative: when in doubt we skip the fallback rather than risk a
+    semantically-different query.
+    """
+    if not _TYPED_NODE_LABEL_RE.search(cypher):
+        return False  # nothing to widen
+    has_contains_filter = bool(
+        re.search(r"\bWHERE\b.*\bCONTAINS\b", cypher, re.IGNORECASE | re.DOTALL)
+    )
+    if has_contains_filter:
+        return True
+    # No name filter — check whether the RETURN is an aggregate over a labeled
+    # variable. If so, widening turns "count Persons" into "count Entities".
+    if _is_pure_aggregation(cypher):
+        return False
+    return True


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Narrow the label-widen fallback so it can't answer a different question.

This gate currently treats almost any non-trivial query as safe to widen, then rewrites every typed entity label to __Entity__. For the new top-N and BOTH examples, a 0-row retry can turn Person/Organization/Location constraints into a generic entity traversal and return plausible-but-wrong rows. Since cypher results are later framed as authoritative, that's a major correctness risk. Please only widen variables that are directly pinned by their own name CONTAINS predicate, or disable widening for multi-label / aggregate / group-by / intersection shapes.

Possible direction

def _should_widen_labels(cypher: str) -> bool: - if not _TYPED_NODE_LABEL_RE.search(cypher): - return False # nothing to widen - has_contains_filter = bool( - re.search(r"\bWHERE\b.*\bCONTAINS\b", cypher, re.IGNORECASE | re.DOTALL) - ) - if has_contains_filter: - return True - # No name filter — check whether the RETURN is an aggregate over a labeled - # variable. If so, widening turns "count Persons" into "count Entities". - if _is_pure_aggregation(cypher): - return False - return True + typed_vars = { + match.group(1) + for match in re.finditer( + r"\(\s*(\w+)\s*:(" + "|".join(re.escape(l) for l in _ENTITY_LABELS) + r")\b", + cypher, + ) + } + if not typed_vars: + return False + + if _is_pure_aggregation(cypher): + return False + + name_filtered_vars = set( + re.findall(r"\b(\w+)\.name\s+CONTAINS\b", cypher, re.IGNORECASE) + ) + return typed_vars <= name_filtered_vars

Also applies to: 522-537

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py` around lines 428 - 451, The widening logic in _should_widen_labels is too permissive; change it to only allow widening when the name CONTAINS predicate explicitly targets the same labeled variable and to disable widening for multi-label, aggregate, group-by, or intersection-shaped queries. Concretely: require a regex that captures a variable name from a pattern with a typed label (use _TYPED_NODE_LABEL_RE to find the labeled variable) and then confirm a corresponding "\bWHERE\b.*\b<var>\.[\w]+\b.*\bCONTAINS\b" match before returning True; return False if the labeled node has multiple labels (e.g., ":Label1:Label2" or other intersection syntax), if _is_pure_aggregation(cypher) is true, or if the query contains GROUP BY or other aggregation keywords; keep the fallback disabled unless the name predicate is pinned to the same variable.

galshubeli · 2026-05-18T13:37:40Z

+    # Reject dotted-namespace function calls (apoc.*, gds.*, db.*).
+    # FalkorDB doesn't implement these plugins; the call silently returns 0
+    # rows at execution. Surfacing it here lets the retry loop regenerate.
+    for ns, _ in _DOTTED_FN_RE.findall(cypher_code):


@Naseem77 — good catch on the literal-stripping gap. Fixed in 4c4f54c.

The validator now strips both single- and double-quoted Cypher string literals (with \-escape handling so 'apoc\'s foo(' doesn't terminate the literal early) before running the _DOTTED_FN_RE, \bCALL\b, LOAD CSV, and write-keyword checks. Structural checks that can't legitimately occur inside a string predicate (start-keyword allowlist, RETURN presence, label existence) still run on cypher_norm since they don't have the false-positive risk.

Regression test in test_cypher_generation.py::test_does_not_reject_blocklisted_tokens_inside_string_literals covers:

n.text CONTAINS 'apoc.foo('

n.text CONTAINS "gds.shortest.path("

n.title = 'CALL me maybe'

escape-honoring: n.text = 'apoc\'s db.foo('

n.text CONTAINS 'LOAD CSV here'

galshubeli · 2026-05-18T13:38:25Z

+            params["org_a"] = org_a
+            params["org_b"] = org_b
+
+        batch_cypher = (


@Naseem77 — agreed, the unbounded scan needed bounds. Fixed in 652f2a4.

Three changes to the batch:

For "both A and B": push WHERE toLower(o.name) CONTAINS toLower($org_a) OR toLower(o.name) CONTAINS toLower($org_b) into the query. _BOTH_AB_RE already captures both org names, so this just propagates them into Cypher rather than re-deriving them downstream in _gather.

For "same X as Z": no org filter is applicable (we need every other org to compare against the target), so the LIMIT + substring cap below are the only guardrails for this shape.

For both shapes: collect(DISTINCT substring(c.text, 0, $max_chunk_len)) (cap = 2000 chars — phrase extraction runs on sentence fragments anyway, so the head of each chunk carries the signal) and a LIMIT 500 row cap as a safety net.

The two caps live as module-level _HYBRID_BATCH_ROW_CAP / _HYBRID_CHUNK_TEXT_CAP constants with rationale in their docstring. Regression test test_hybrid_batch_query_is_scoped_and_capped asserts the WHERE / LIMIT / substring / params behavior for both shapes.

I didn't run _sanitize_cypher on the batch — it's an internal template, not LLM-generated, so it doesn't need the LLM-pattern cleanup. Happy to add it as belt-and-suspenders if you'd prefer.

ruff E741 on cypher_generation.py:417 was failing CI on the PR branch (``re.escape(l) for l in _ENTITY_LABELS``). The variable doesn't exist on main, so the prior lint-fix sweep (0597de9) couldn't catch it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI runs ``ruff format --check src/`` (added on main in 569d886, before this branch existed) and three files have been failing it: cypher_first.py, cypher_generation.py, result_assembly.py. Most of the diff is pre-existing wrapping (single-line-list-on-multiple-lines collapsed, ternary wrapping, blank-line spacing) — no semantic changes. ``ruff format src/`` clean; ``ruff check src/`` clean; full SDK suite unchanged (846 passed, 23 skipped). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

galshubeli and others added 5 commits May 10, 2026 17:44

galshubeli mentioned this pull request May 14, 2026

docs(retrieval): recommend SentenceTokenCapChunking for CypherFirst #253

Merged

2 tasks

Merge pull request #253 from FalkorDB/docs/cypher-first-recommend-sen…

ef42a09

…tence-chunker docs(retrieval): recommend SentenceTokenCapChunking for CypherFirst

Naseem77 mentioned this pull request May 17, 2026

Production Hardening & Release Readiness #257

Open

6 tasks

Naseem77 requested changes May 18, 2026

View reviewed changes

galshubeli and others added 3 commits May 18, 2026 16:22

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

galshubeli commented May 18, 2026

View reviewed changes

galshubeli and others added 2 commits May 18, 2026 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(retrieval): CypherFirstAggregationStrategy + cypher-path accuracy fixes#255

feat(retrieval): CypherFirstAggregationStrategy + cypher-path accuracy fixes#255
galshubeli wants to merge 11 commits into
mainfrom
feat/cypher-aggregation-accuracy

galshubeli commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Naseem77 left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 18, 2026

Uh oh!

galshubeli May 18, 2026

Uh oh!

galshubeli May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

galshubeli commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

What's in this PR (4 commits)

Compatibility (cypher-off callers)

Usage

Related PRs

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Naseem77 left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

galshubeli May 18, 2026

Choose a reason for hiding this comment

Uh oh!

galshubeli May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

galshubeli commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading