Skip to content

feat(retrieval): CypherFirstAggregationStrategy + cypher-path accuracy fixes#255

Open
galshubeli wants to merge 11 commits into
mainfrom
feat/cypher-aggregation-accuracy
Open

feat(retrieval): CypherFirstAggregationStrategy + cypher-path accuracy fixes#255
galshubeli wants to merge 11 commits into
mainfrom
feat/cypher-aggregation-accuracy

Conversation

@galshubeli
Copy link
Copy Markdown
Collaborator

@galshubeli galshubeli commented May 14, 2026

Summary

Adds a new opt-in retrieval strategy, CypherFirstAggregationStrategy, that routes quantitative/structural questions ("how many", "which X has the most", "BOTH A and B", "are there any X without Y", "what is the average …") through a deterministic Cypher-first path. Non-aggregation questions delegate to MultiPathRetrieval unchanged.

Alongside the strategy, this PR also lands six targeted SDK-level fixes to the pre-existing cypher path (used by MultiPathRetrieval(enable_cypher=True) too): smart LIMIT injection, APOC/GDS/db.* function-call blocklist in the validator, 0-row label-widen fallback, two new schema-prompt examples, and metadata threading for observability.

Background

A focused failure-mode investigation on a 56-person / 10-org synthetic benchmark identified seven distinct failure modes on aggregation questions — none of them random one-offs:

  • Cypher results encoded as \" | \"-joined strings lose column names, so the answer-LLM can swap row values
  • Auto LIMIT 25 truncates group-by aggregations silently
  • apoc.text.regexGroups(...)-style function calls slipped past the CALL blocklist
  • Typed-label cypher queries fail when extraction labeled the entity differently
  • Schema prompt had no examples for "BOTH X AND Y" or top-N group-by
  • Empty cypher results were treated identically for positive and negation existentials
  • Roles/projects extracted as free text in chunks weren't queryable as typed nodes

The new strategy directly addresses each one. The 7-question failure-mode benchmark went from 2-5/7 (depending on which pre-fix variant) to 7/7 stable across three runs.

What's in this PR (4 commits)

Commit Title
`bb920c6` improve text-to-Cypher accuracy on aggregation questions
`7fa7a1f` add CypherFirstAggregationStrategy
`deaa64d` pre-merge review fixes for CypherFirstAggregationStrategy
`d818eb1` split CypherFirst into path classes + pluggable extractor

Each commit has its own detailed body.

Compatibility (cypher-off callers)

The PR is additive for users who don't opt in:

  • CypherFirstAggregationStrategy only runs if passed explicitly as retrieval_strategy=
  • The cypher-path improvements (smart LIMIT, APOC blocklist, etc.) live inside the cypher generation/execution code and only run when enable_cypher=True
  • The "Authoritative Graph Query Results" system-prompt rule is added conditionally — only when the retriever produced a cypher_results section. With cypher off, the base system prompt is byte-identical to main.

Test coverage (`tests/test_facade.py::TestCypherAuthorityRuleInjection`) pins the contract.

Usage

```python
from graphrag_sdk import (
CypherFirstAggregationStrategy,
FastCorefResolver,
GraphExtraction,
LLMVerifiedResolution,
SentenceTokenCapChunking,
)

chunker = SentenceTokenCapChunking(max_tokens=512, overlap_sentences=2)
extractor = GraphExtraction(llm=llm, coref_resolver=FastCorefResolver())
resolver = LLMVerifiedResolution(llm=llm, embedder=embedder)

async with GraphRAG(
connection=conn, llm=llm, embedder=embedder, embedding_dimension=256,
) as rag:
await rag.ingest(text=doc, chunker=chunker,
extractor=extractor, resolver=resolver)
await rag.finalize()
rag._retrieval_strategy = CypherFirstAggregationStrategy(
graph_store=rag._graph_store,
vector_store=rag._vector_store,
embedder=embedder,
llm=llm,
)
answer = await rag.completion("Which city has the most employees?")
```

Related PRs

Test plan

  • Unit tests: `tests/test_cypher_first.py` (66 tests), `tests/test_cypher_generation.py` (49 tests), `tests/test_facade.py` (75 tests including R1 contract)
  • End-to-end aggregation benchmark: 7/7 stable across 3 runs (`CypherFirstAggregationStrategy`) and 7/7 single run (`MultiPathRetrieval(enable_cypher=True)`)
  • Full SDK suite: 748 passed, 24 skipped on the cypher branch tip
  • `ruff check` clean on all touched files
  • Reviewer to confirm CI green on this PR

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Cypher-first retrieval for aggregation/numeric queries
    • Pluggable phrase-extraction interface with a default extractor
    • Exposed new aggregation and phrase-extractor classes in the public API
    • Markdown-table formatting and numeric-coercion utilities
  • Improvements

    • System prompt addendum when authoritative graph-query results are present
    • Stricter query validation, default LIMIT injection, and label‑widening retry on empty results
    • Increased graph-result row cap and richer cypher-related metadata
  • Tests

    • Extensive tests covering routing, generation, formatting, coercion, and authority-rule behavior

Review Change Stack

galshubeli and others added 5 commits May 10, 2026 17:44
…ions

Six localized fixes to the cypher_generation pipeline, identified from a
failure-mode investigation on a 56-person/10-org synthetic corpus where
both vector-only and cypher-enabled retrieval were silently producing
wrong answers on counting, top-N, group-by, and intersection questions.

P0 fixes (ship together):

- Smart LIMIT injection. _sanitize_cypher no longer auto-injects LIMIT on
  pure aggregations (count/sum/avg without group-by) and uses the new
  _DEFAULT_ROW_LIMIT=100 constant otherwise. Paired with raising the
  result_assembly slice cap to 100 plus a truncation sentinel, this stops
  group-by lists ("orgs with >=5 employees", "top-N city") from being
  silently cut at 20 rows.

- Authoritative result framing. The cypher results section is renamed to
  "Authoritative Graph Query Results (deterministic; trust over passages
  on counts and aggregates)" and a matching rule 8 is added to both
  _RAG_SYSTEM_PROMPT variants in api/main.py. Together they stop the LLM
  from contradicting a correct numeric cypher answer when verbose passage
  text mentions a different entity.

- APOC/GDS/db function blocklist. validate_cypher now rejects dotted-
  namespace function calls (apoc.text.regexGroups, gds.*, db.*) which
  FalkorDB silently returns 0 rows for. The error feeds the existing
  retry-with-feedback loop so attempt 2 has a concrete fix-it.

P1 fixes:

- 0-row label-widen fallback. When a typed-label cypher returns 0 rows
  AND a name predicate is present (label is a routing hint, not the
  filter itself), execute_cypher_retrieval rewrites typed labels to
  __Entity__ and re-runs once. Pure-cypher rewrite, no second LLM call.
  Recovers cases where the extractor labelled an entity differently than
  the schema prompt steered the LLM toward.

- Two new schema-prompt examples replacing redundant ones: a top-N
  group-by ("city with most employees", explicit 2-hop) and a set-
  intersection ("works at BOTH A and B") with a matching rule.

P2: cypher metadata (cypher_fallback, cypher_truncated, cypher_rows)
threaded through assemble_raw_result into RetrieverResult.metadata so
operators can monitor fallback firing rate.

Public API: execute_cypher_retrieval now returns a 3-tuple
(facts, entities, metadata) instead of (facts, entities). Internal —
only callers are multi_path.py and tests.

Verification on the 6-question matrix moved 2 questions from wrong to
correct (city group-by, observability existential via label widen) and
preserved the 4 that were already passing. Unit tests added for
_is_pure_aggregation, _split_top_level_commas, _widen_typed_labels,
_should_widen_labels, apoc rejection, and label-widen firing/gating.
49 cypher tests + full SDK suite (629) green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New retrieval strategy that routes quantitative and structural questions
through a deterministic Cypher-first path while delegating non-aggregation
questions to a fallback strategy (default MultiPathRetrieval). Safe to use
as the top-level strategy on GraphRAG.

Implements six mechanisms identified from a failure-mode investigation
where aggregation answers were being silently corrupted by extraction
noise, lossy result formatting, and LLM mistrust of bare cypher numbers:

- Intent classifier routes per question into numeric_math / aggregation /
  rag. Catches "how many", "which X", "more X than Y", "BOTH A and B",
  "are there any", "average / total of NUMBER".

- Multi-candidate cypher generation. K parallel samples per question,
  execute all, pick the highest-row-count result. Beats LLM stochasticity
  on structural interpretation without serial retries.

- Column-named markdown table formatting via result.header. Eliminates
  the "10 | 7 | True" ambiguity that was swapping comparison answers.

- Description+chunk-text fuzzy hybrid for "shared X" / "BOTH A and B"
  shapes when X is a free-text property (role, project) not extracted as
  a typed entity. Single batched cypher, sentence-restricted regex
  extraction, fuzzy intersect (substring or 2-token overlap). Recovers
  cases where graph extraction summarized away the project names.

- Numeric-math sub-path. RETURN raw values, do average / sum / median in
  Python. Avoids LLM-arithmetic errors deterministically.

- Negation-existential empty handling. For "are there any X without Y?"
  an empty cypher result is the definitive "No"; positive existentials
  fall back to vector retrieval since extraction labels are unreliable.

The strategy emits its result under an "Authoritative Graph Query Results"
section heading that rule 8 of the existing _RAG_SYSTEM_PROMPT (added in
bb920c6) is already configured to trust on quantitative questions.

Benchmark: prototype scored 7/7 stably across three runs on the seven-
question failure-mode matrix that previous strategies hit 2-5/7 on. The
SDK port scored 6/7 in an end-to-end check; the remaining failure was
malformed extracted org names ("Glo" / "Initech System") leaking through
into the answer — an extraction-quality issue orthogonal to this strategy.

Public API: CypherFirstAggregationStrategy is exported at the top level
and from graphrag_sdk.retrieval.strategies. Pass it as
retrieval_strategy= on GraphRAG construction to opt in.

Tests: 45 unit tests cover the pure-Python helpers (intent classifier,
shape detectors, role/project regex extractors, fuzzy intersect, markdown
table formatter, numeric coercion). Full SDK suite stays green at
674 passed, 3 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nStrategy

Addresses concerns from a pre-merge review of 7fa7a1f. Behavior preserved
on the failure-mode benchmark; the changes are about surface quality,
observability, and test coverage.

R1 — Make the cypher-authority rule opt-in. Rule 8 ("trust the
'Authoritative Graph Query Results' section over passages on counts and
aggregates") used to live inside the base _RAG_SYSTEM_PROMPT and
_RAG_SYSTEM_PROMPT_DELIMITED, so it fired on every completion() call
SDK-wide — including users who never enable cypher retrieval. The rule is
now extracted into a separate constant _CYPHER_AUTH_RULE and appended to
the system prompt only when the retriever produced a cypher_results
section (detected via item metadata or the canonical heading marker as a
defensive fallback). Callers on MultiPathRetrieval without enable_cypher
keep the unchanged 7-rule prompt.

R2 — Add cypher_first_path metadata to every strategy result. The
strategy has five sub-paths plus three RAG-fallback branches; today
operators can't tell which one handled a query from the result alone.
Each result now carries one of seven canonical PATH_* labels:
numeric_math, shared_property_hybrid, cypher_table, negation_empty_no,
rag_fallback, rag_fallback_numeric_fail, rag_fallback_cypher_empty. A
shared _tag_path() helper handles the bookkeeping including the three
delegated-fallback wrappings so the contract is uniform.

R3 — Document prose-shape and graph-topology assumptions. The shared-
property hybrid was tuned on graphs produced by the SDK's default
GraphExtraction pipeline; custom extractors or domain prose may not
match. The class docstring now has dedicated "Assumptions and known
limits" and "Accuracy ceiling" sections naming each one. Plus a runtime
warning fires when the batched (Org)<-[:RELATES]-(Person) query returns
zero tuples, so operators on different schemas get a fast signal rather
than silent wrong answers.

R4 — Mocked end-to-end routing tests. The 45 existing unit tests cover
pure-Python helpers, not the strategy's branching. Seven new tests use a
mock LLM (returning canned cypher), a mock graph (returning canned
result_sets), and a stub _FakeFallback to assert which sub-path fires
for each intent + graph-state combination. Patterns covered: rag intent
→ fallback, aggregation + rows → cypher_table, numeric → python math,
numeric extraction empty → fallback (with numeric_fail tag), negation
+ empty cypher → No (no delegation), positive-existential + empty cypher
→ fallback (with cypher_empty tag), topology-violation warning.

R10 — ruff check pass. Dropped two unused imports (RetrieverError,
ChatMessage) and sorted the test_cypher_first imports.

All 75 facade + 57 cypher_first + 49 cypher_generation tests pass. Full
SDK suite 686 passed, 3 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…extractor

R5 — Split CypherFirstAggregationStrategy into composable path classes.
The strategy file went from 944 lines with five sub-paths crammed into
the strategy class to a small dispatcher (~80 lines) backed by four
focused _AggregationPath subclasses:

  - _RagDelegationPath        — intent="rag" → fallback verbatim
  - _NumericMathPath          — intent="numeric_math" → Python arithmetic
  - _SharedPropertyHybridPath — "BOTH A and B" / "same X as Z" via chunks
  - _MultiCandidateCypherPath — K parallel cypher candidates + table
    (also owns the negation-empty and cypher-empty-fallback branches)

Each path is a small class with a single `maybe_handle(query, ctx)`
method that returns either a final `RawSearchResult` or `None` to defer.
The strategy's `_execute` dispatches by intent and consults the relevant
paths in order. Existing routing tests (TestStrategyRouting, 7 cases)
keep covering the contract end-to-end; the helper classes are
implementation detail.

Pure refactor — no behaviour change. All paths share state via a single
reference to the parent strategy, so callers' constructor signature is
unchanged.

R8 — Pluggable phrase extractor for the shared-property hybrid.

The default role/project regexes target the prose patterns the SDK's
GraphExtraction pipeline produces ("works at X as a <role>", "contributes
to <project>") and a closed set of role-suffix words. Domain-specific
use cases (medical, legal, e-commerce, non-English) need different
vocabularies.

Added a `PhraseExtractor` ABC and a `DefaultPhraseExtractor` that wraps
the existing regexes. Strategies accept a `phrase_extractor=` parameter
on construction; the shared-property hybrid consults it instead of the
module-level `_extract_phrases`. Domain users can now subclass and
inject without forking.

Exports added at strategies/__init__.py and top-level graphrag_sdk
namespace.

Tests
-----
- 1 unit test for DefaultPhraseExtractor (matches the default regexes,
  unknown kinds return empty set rather than raising).
- 1 mocked end-to-end test confirming that a custom extractor passed to
  the strategy is consulted by the hybrid path (intersection computed
  over the custom vocabulary, not the default).

All 763 unit tests pass; ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a "Recommended ingestion config" section to
CypherFirstAggregationStrategy's docstring.

The default chunker (FixedSizeChunking) splits on a character window
with no sentence/paragraph awareness, so entity names get truncated at
chunk boundaries — "Wayne Enterprises" becomes "Wayne En" in one chunk
and "terprises..." in the next. The resulting stub entities never merge
with their full forms during resolution, so cypher counts come back
inflated and "which X" lists pick up phantoms.

On the internal 7-question aggregation benchmark, switching from
FixedSizeChunking(chunk_size=1000) to
SentenceTokenCapChunking(max_tokens=512, overlap_sentences=2) moved the
score from 6/7 (intermittent) to 7/7 (stable across three runs). The
post-ingest graph state went from 11-14 organization nodes including
"Glo" / "Initech System" / "Wayne En" stubs to exactly 10 clean orgs,
and from 66-80 Person nodes (with Carla / Carla Okafor duplicates) to
exactly 56 distinct persons.

The short-form duplicate fix is a side benefit: sentence-aware chunks
follow natural prose boundaries with overlap, so a person's first
mention almost always lands in the same chunk as their later short-form
references — per-chunk FastCoref then has the antecedent it needs.

This docstring is the smallest useful change. A separate PR will propose
changing the SDK default chunker; that's a larger conversation since it
affects every existing caller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bb232e35-c5b4-4ab4-94d3-1171e61d69b5

📥 Commits

Reviewing files that changed from the base of the PR and between 7ba4848 and 98aa604.

📒 Files selected for processing (3)
  • graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py
  • graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py
  • graphrag_sdk/src/graphrag_sdk/retrieval/strategies/result_assembly.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • graphrag_sdk/src/graphrag_sdk/retrieval/strategies/result_assembly.py
  • graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py
  • graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py

📝 Walkthrough

Walkthrough

This PR adds a Cypher-first retrieval strategy with intent detection and three aggregation paths (numeric-math, shared-property hybrid, multi-candidate table), updates Cypher generation (LIMIT injection, namespace validation, label-widening fallback), increases cypher result cap, threads cypher metadata through MultiPathRetrieval and result assembly, injects an authority prompt when cypher results are present, and adds tests.

Changes

Cypher-First Aggregation Strategy

Layer / File(s) Summary
Public API Exports and Module Wiring
graphrag_sdk/src/graphrag_sdk/__init__.py, graphrag_sdk/src/graphrag_sdk/retrieval/strategies/__init__.py
Expose CypherFirstAggregationStrategy, DefaultPhraseExtractor, and PhraseExtractor via package imports and __all__.
Cypher Authority Rule and Prompt Integration
graphrag_sdk/src/graphrag_sdk/api/main.py
Add _CYPHER_AUTH_RULE, markers, _has_authoritative_cypher_results, and conditional prompt augmentation in completion().
Cypher Generation Validation, LIMITs, and Label Widening
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py
Add default row limit, detect pure-aggregation to skip LIMIT injection, reject dotted-namespace function calls, implement typed-label widening fallback, and return enriched metadata.
MultiPath Retrieval Metadata Threading
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/multi_path.py
Accept 3-tuple cypher returns (facts, entities, metadata) and forward cypher_metadata into result assembly.
Result Assembly Cap Increase and Metadata Merging
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/result_assembly.py
Introduce _CYPHER_RESULT_CAP = 100, produce "Authoritative Graph Query Results" section with truncation notice, and merge prefixed cypher_ metadata into final result metadata.
Cypher-First Core: Intent, Extraction, Utilities
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py
Add intent/shape detection, yes/no/which-list classifiers, property-kind inference, PhraseExtractor and DefaultPhraseExtractor, markdown-table formatting, tagging utilities, and numeric-coercion helper.
Numeric-Math Execution Path
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py (Numeric path)
Generate/validate Cypher returning numeric cells, coerce numbers, compute aggregates in Python, and fallback to RAG when no numeric extraction.
Shared-Property Hybrid Execution Path
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py (Hybrid path)
Run batched Cypher to fetch descriptions, extract role/project phrases via pluggable extractor, and compute fuzzy intersections/overlaps for "both/same" questions.
Multi-Candidate Cypher-Table Path
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py (Multi-candidate path)
Generate K candidate Cypher queries, execute in parallel, rank by row count, format best result as markdown table, and handle negation-existential vs fallback.
CypherFirstAggregationStrategy Class
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py
Wire embedder/LLM/fallback/phrase-extractor, build sub-paths, and dispatch in _execute by detected intent.
Formatting, Tagging, and Helpers
graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py
Table rendering with header synthesis, null/list handling, truncation flag; _tag_path for cypher_first_path; _coerce_number helper.
Comprehensive Test Coverage
graphrag_sdk/tests/*
Add tests for intent detection, shape classification, phrase extraction, fuzzy intersection, table formatting, numeric coercion, path tagging, end-to-end routing, cypher generation helpers, label-widening fallback, and authority-rule injection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • FalkorDB/GraphRAG-SDK#253: Overlapping work introducing and exposing the cypher_first strategy and cypher-authority prompt gating.

Suggested reviewers

  • Naseem77

Poem

🐰 A rabbit hops through cyphered trees tonight,
Detecting intent by moonbeam's light,
Numbers to Python, phrases to match,
Tables and fallbacks all neatly dispatched,
Authority whispered, results set right. 🌙

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: introducing CypherFirstAggregationStrategy and including cypher-path accuracy improvements, which directly reflects the primary features in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/cypher-aggregation-accuracy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…tence-chunker

docs(retrieval): recommend SentenceTokenCapChunking for CypherFirst
Copy link
Copy Markdown
Contributor

@Naseem77 Naseem77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. [medium] graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py:222 Dotted-namespace blocklist matches inside string literals
  • Problem: validate_cypher only strips comments (//, /*
    */) from cypher_norm — it doesn't strip string literals before running _DOTTED_FN_RE. A legitimate query like MATCH (n) WHERE n.text CONTAINS 'apoc.foo(' RETURN n would be rejected as using an APOC
    function.
  • Impact: False-positive rejection forces the retry loop to regenerate (or, after exhausting retries, the strategy silently returns empty results) for queries whose string-literal payloads happen to contain
    .(. The pattern is realistic for LLM-generated regex/text-search predicates.
  • Fix: Strip single/double-quoted string literals (with escape handling) from cypher_norm before applying _DOTTED_FN_RE, the same way comments are stripped. The existing \bCALL\b check has the same latent
    issue but the new dotted-fn check is much more likely to hit it because dots inside quoted regexes are common.
  1. [medium] graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py:580 Shared-property hybrid runs an unbounded full-graph scan
  • Problem: The batch_cypher MATCH (o:Organization)<-[:RELATES]-(p:Person) OPTIONAL MATCH (p)-[:MENTIONED_IN]->(c:Chunk) RETURN ... collect(DISTINCT c.text) AS chunks has no LIMIT and pulls every Person–Org
    pair with all their chunk texts into memory per query. _sanitize_cypher is not applied (the query goes straight to graph_store.query_raw).
  • Impact: On any non-toy graph (thousands of persons × dozens of chunks each), this is a multi-MB result on every "both A and B" / "same role as" question. Latency and memory grow linearly with graph size,
    with no cap.
  • Fix: Restrict the batch to the orgs named in the question (the regex already captures org_a/org_b / target before the call) and/or cap chunk text length and row count. The current code already discards
    rows whose org name doesn't match — push that filter into the query.

galshubeli and others added 3 commits May 18, 2026 16:22
Concatenate the two end-of-file blocks in tests/test_facade.py — the new
TestCypherAuthorityRuleInjection class from this branch and the v1.1.0
test surface (path-as-id, update, delete_document, apply_changes,
sync-wrapper signatures) from main. The blocks don't overlap; both are
kept verbatim.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The dotted-namespace blocklist (_DOTTED_FN_RE), \bCALL\b check, LOAD CSV
check, and write-keyword check all ran over the comment-stripped Cypher
without first stripping quoted string literals. A query like

    MATCH (n) WHERE n.text CONTAINS 'apoc.foo(' RETURN n

was rejected as using an APOC function, forcing the retry-with-feedback
loop to regenerate (or silently return empty after retries exhausted).
LLM-generated CONTAINS / regex predicates often look like this, so the
false-positive rate is non-trivial.

_strip_string_literals replaces both single- and double-quoted literals
with empty quotes (honoring backslash escapes so embedded quotes don't
end the match early). All keyword/function pattern checks now run over
that ``cypher_code`` view; structural checks (start keyword, RETURN
presence, label existence) keep using ``cypher_norm`` since their
patterns can't legitimately occur inside string literals.

Reported by Naseem on #255.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The _SharedPropertyHybridPath batch query ran an unbounded full-graph
scan of (Organization)<-[:RELATES]-(Person) with every chunk text
collected per pair. On any non-toy graph, "both A and B" and "same X as
Z" questions returned multi-MB results per call, growing linearly with
graph size. The post-query Python filter (_gather) discarded most of it,
so almost all of the data was wasted.

Three scoping changes:

  - For "both A and B": push toLower(o.name) CONTAINS toLower($org_a/b)
    into the WHERE clause so the database doesn't even materialize rows
    for the other orgs. The original regex (_BOTH_AB_RE) already
    captures the two org names — this just propagates them into the
    query rather than re-deriving them downstream.
  - For "same X as Z": no org filter is applicable (we need every other
    org to compare against the target), so the LIMIT and substring cap
    below are the only guardrails.
  - For both shapes: collect substring(c.text, 0, $max_chunk_len)
    instead of full chunks (cap = 2000 chars; phrase extraction runs on
    sentence fragments so the head of each chunk carries the signal),
    and add LIMIT 500 as a safety net.

Reported by Naseem on #255.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py`:
- Around line 428-451: The widening logic in _should_widen_labels is too
permissive; change it to only allow widening when the name CONTAINS predicate
explicitly targets the same labeled variable and to disable widening for
multi-label, aggregate, group-by, or intersection-shaped queries. Concretely:
require a regex that captures a variable name from a pattern with a typed label
(use _TYPED_NODE_LABEL_RE to find the labeled variable) and then confirm a
corresponding "\bWHERE\b.*\b<var>\.[\w]+\b.*\bCONTAINS\b" match before returning
True; return False if the labeled node has multiple labels (e.g.,
":Label1:Label2" or other intersection syntax), if _is_pure_aggregation(cypher)
is true, or if the query contains GROUP BY or other aggregation keywords; keep
the fallback disabled unless the name predicate is pinned to the same variable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7d6e6ffb-bf7f-42d6-a0fa-871f35f05ccb

📥 Commits

Reviewing files that changed from the base of the PR and between ef42a09 and 652f2a4.

📒 Files selected for processing (7)
  • graphrag_sdk/src/graphrag_sdk/__init__.py
  • graphrag_sdk/src/graphrag_sdk/api/main.py
  • graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_first.py
  • graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py
  • graphrag_sdk/tests/test_cypher_first.py
  • graphrag_sdk/tests/test_cypher_generation.py
  • graphrag_sdk/tests/test_facade.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • graphrag_sdk/src/graphrag_sdk/api/main.py
  • graphrag_sdk/src/graphrag_sdk/init.py
  • graphrag_sdk/tests/test_facade.py
  • graphrag_sdk/tests/test_cypher_generation.py
  • graphrag_sdk/tests/test_cypher_first.py

Comment on lines +428 to +451
def _should_widen_labels(cypher: str) -> bool:
"""Gate for the 0-row label-widen fallback.

Skip widening when the typed label IS the filter — i.e. the RETURN
aggregates over a labeled variable AND the query has no ``WHERE … CONTAINS``
name predicate. In that case the user is asking "how many Persons?" and
widening would change the semantics. Otherwise (typical case: typed label
present alongside a name predicate or non-aggregate RETURN), widen.

Conservative: when in doubt we skip the fallback rather than risk a
semantically-different query.
"""
if not _TYPED_NODE_LABEL_RE.search(cypher):
return False # nothing to widen
has_contains_filter = bool(
re.search(r"\bWHERE\b.*\bCONTAINS\b", cypher, re.IGNORECASE | re.DOTALL)
)
if has_contains_filter:
return True
# No name filter — check whether the RETURN is an aggregate over a labeled
# variable. If so, widening turns "count Persons" into "count Entities".
if _is_pure_aggregation(cypher):
return False
return True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Narrow the label-widen fallback so it can't answer a different question.

This gate currently treats almost any non-trivial query as safe to widen, then rewrites every typed entity label to __Entity__. For the new top-N and BOTH examples, a 0-row retry can turn Person/Organization/Location constraints into a generic entity traversal and return plausible-but-wrong rows. Since cypher results are later framed as authoritative, that's a major correctness risk. Please only widen variables that are directly pinned by their own name CONTAINS predicate, or disable widening for multi-label / aggregate / group-by / intersection shapes.

Possible direction
 def _should_widen_labels(cypher: str) -> bool:
-    if not _TYPED_NODE_LABEL_RE.search(cypher):
-        return False  # nothing to widen
-    has_contains_filter = bool(
-        re.search(r"\bWHERE\b.*\bCONTAINS\b", cypher, re.IGNORECASE | re.DOTALL)
-    )
-    if has_contains_filter:
-        return True
-    # No name filter — check whether the RETURN is an aggregate over a labeled
-    # variable. If so, widening turns "count Persons" into "count Entities".
-    if _is_pure_aggregation(cypher):
-        return False
-    return True
+    typed_vars = {
+        match.group(1)
+        for match in re.finditer(
+            r"\(\s*(\w+)\s*:(" + "|".join(re.escape(l) for l in _ENTITY_LABELS) + r")\b",
+            cypher,
+        )
+    }
+    if not typed_vars:
+        return False
+
+    if _is_pure_aggregation(cypher):
+        return False
+
+    name_filtered_vars = set(
+        re.findall(r"\b(\w+)\.name\s+CONTAINS\b", cypher, re.IGNORECASE)
+    )
+    return typed_vars <= name_filtered_vars

Also applies to: 522-537

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@graphrag_sdk/src/graphrag_sdk/retrieval/strategies/cypher_generation.py`
around lines 428 - 451, The widening logic in _should_widen_labels is too
permissive; change it to only allow widening when the name CONTAINS predicate
explicitly targets the same labeled variable and to disable widening for
multi-label, aggregate, group-by, or intersection-shaped queries. Concretely:
require a regex that captures a variable name from a pattern with a typed label
(use _TYPED_NODE_LABEL_RE to find the labeled variable) and then confirm a
corresponding "\bWHERE\b.*\b<var>\.[\w]+\b.*\bCONTAINS\b" match before returning
True; return False if the labeled node has multiple labels (e.g.,
":Label1:Label2" or other intersection syntax), if _is_pure_aggregation(cypher)
is true, or if the query contains GROUP BY or other aggregation keywords; keep
the fallback disabled unless the name predicate is pinned to the same variable.

# Reject dotted-namespace function calls (apoc.*, gds.*, db.*).
# FalkorDB doesn't implement these plugins; the call silently returns 0
# rows at execution. Surfacing it here lets the retry loop regenerate.
for ns, _ in _DOTTED_FN_RE.findall(cypher_code):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naseem77 — good catch on the literal-stripping gap. Fixed in 4c4f54c.

The validator now strips both single- and double-quoted Cypher string literals (with \-escape handling so 'apoc\'s foo(' doesn't terminate the literal early) before running the _DOTTED_FN_RE, \bCALL\b, LOAD CSV, and write-keyword checks. Structural checks that can't legitimately occur inside a string predicate (start-keyword allowlist, RETURN presence, label existence) still run on cypher_norm since they don't have the false-positive risk.

Regression test in test_cypher_generation.py::test_does_not_reject_blocklisted_tokens_inside_string_literals covers:

  • n.text CONTAINS 'apoc.foo('
  • n.text CONTAINS "gds.shortest.path("
  • n.title = 'CALL me maybe'
  • escape-honoring: n.text = 'apoc\'s db.foo('
  • n.text CONTAINS 'LOAD CSV here'

params["org_a"] = org_a
params["org_b"] = org_b

batch_cypher = (
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naseem77 — agreed, the unbounded scan needed bounds. Fixed in 652f2a4.

Three changes to the batch:

  1. For "both A and B": push WHERE toLower(o.name) CONTAINS toLower($org_a) OR toLower(o.name) CONTAINS toLower($org_b) into the query. _BOTH_AB_RE already captures both org names, so this just propagates them into Cypher rather than re-deriving them downstream in _gather.
  2. For "same X as Z": no org filter is applicable (we need every other org to compare against the target), so the LIMIT + substring cap below are the only guardrails for this shape.
  3. For both shapes: collect(DISTINCT substring(c.text, 0, $max_chunk_len)) (cap = 2000 chars — phrase extraction runs on sentence fragments anyway, so the head of each chunk carries the signal) and a LIMIT 500 row cap as a safety net.

The two caps live as module-level _HYBRID_BATCH_ROW_CAP / _HYBRID_CHUNK_TEXT_CAP constants with rationale in their docstring. Regression test test_hybrid_batch_query_is_scoped_and_capped asserts the WHERE / LIMIT / substring / params behavior for both shapes.

I didn't run _sanitize_cypher on the batch — it's an internal template, not LLM-generated, so it doesn't need the LLM-pattern cleanup. Happy to add it as belt-and-suspenders if you'd prefer.

galshubeli and others added 2 commits May 18, 2026 17:01
ruff E741 on cypher_generation.py:417 was failing CI on the PR branch
(``re.escape(l) for l in _ENTITY_LABELS``). The variable doesn't exist
on main, so the prior lint-fix sweep (0597de9) couldn't catch it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI runs ``ruff format --check src/`` (added on main in 569d886, before this
branch existed) and three files have been failing it:
cypher_first.py, cypher_generation.py, result_assembly.py. Most of the
diff is pre-existing wrapping (single-line-list-on-multiple-lines
collapsed, ternary wrapping, blank-line spacing) — no semantic changes.

``ruff format src/`` clean; ``ruff check src/`` clean; full SDK suite
unchanged (846 passed, 23 skipped).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants