Skip to content

fix(data-access): support case-insensitive key filters for PostgREST queries#1388

Closed
solaris007 wants to merge 3 commits intomainfrom
fix/case-insensitive-key-filters
Closed

fix(data-access): support case-insensitive key filters for PostgREST queries#1388
solaris007 wants to merge 3 commits intomainfrom
fix/case-insensitive-key-filters

Conversation

@solaris007
Copy link
Copy Markdown
Member

Summary

  • After switching production to PostgREST, URL lookups became case-sensitive because #applyKeyFilters in BaseCollection always used .eq(), which maps to PostgreSQL's case-sensitive equality operator
  • Adds a caseInsensitive schema attribute property that makes #applyKeyFilters use .ilike() instead of .eq() for marked fields
  • Marks baseURL in the Site schema with caseInsensitive: true, restoring case-insensitive behavior for all auto-generated accessors (findByBaseURL, allByBaseURL)

Test plan

  • New unit test verifying .ilike() is used for caseInsensitive attributes and .eq() for regular ones
  • Full data-access test suite passes (1516 tests)
  • IT suite against PostgREST backend with mixed-case URL lookups

…queries

After switching to PostgREST, URL lookups became case-sensitive because
#applyKeyFilters always used .eq() which maps to PostgreSQL's
case-sensitive equality. This adds a caseInsensitive schema attribute
property that makes #applyKeyFilters use .ilike() instead, restoring
the case-insensitive behavior for baseURL lookups.
@solaris007 solaris007 requested a review from ekremney as a code owner March 1, 2026 13:20
@solaris007 solaris007 self-assigned this Mar 1, 2026
@solaris007 solaris007 added the bug Something isn't working label Mar 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 1, 2026

This PR will trigger a patch release when merged.

ILIKE treats % and _ as wildcards, so without escaping, a value like
'my_site.com' would match 'myXsite.com'. Escape these characters before
passing to ilike so it behaves as case-insensitive equality.
const dbField = this.#toDbField(key);
const attr = this.schema.getAttribute(key);
if (attr?.caseInsensitive && typeof value === 'string') {
const escaped = value.replace(/[%_]/g, '\\$&');

Check failure

Code scanning / CodeQL

Incomplete string escaping or encoding High

This does not escape backslash characters in the input.

Copilot Autofix

AI 2 months ago

In general, when escaping characters for a pattern language (like SQL LIKE), you must escape not only the special pattern characters (here % and _) but also the escape character itself (commonly \). Otherwise, an attacker can prepend a backslash to a wildcard so that your escaping turns \% into \\%, which may be interpreted as an escaped backslash followed by a wildcard, reintroducing special meaning for %. The safest approach is to first escape all backslashes, then escape % and _, using a global regular expression for each.

The best targeted fix here is to adjust the escaping logic on line 384 to escape backslashes as well. To preserve existing behavior while making it correct, we can transform the value in two steps: replace each single backslash with a double backslash, then escape % and _ by prefixing them with backslashes. Doing backslashes first avoids re-escaping the ones we add when escaping % and _. Concretely, in #applyKeyFilters, change the single value.replace(/[%_]/g, '\\$&') call to a small sequence:

const escaped = value
  .replace(/\\/g, '\\\\')
  .replace(/[%_]/g, '\\$&');

This keeps everything local to this function, introduces no behavior changes beyond correctly handling backslashes, and requires no new imports or additional helpers. Only the body of the if (attr?.caseInsensitive && typeof value === 'string') block needs to be updated.

Suggested changeset 1
packages/spacecat-shared-data-access/src/models/base/base.collection.js

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/packages/spacecat-shared-data-access/src/models/base/base.collection.js b/packages/spacecat-shared-data-access/src/models/base/base.collection.js
--- a/packages/spacecat-shared-data-access/src/models/base/base.collection.js
+++ b/packages/spacecat-shared-data-access/src/models/base/base.collection.js
@@ -381,7 +381,9 @@
       const dbField = this.#toDbField(key);
       const attr = this.schema.getAttribute(key);
       if (attr?.caseInsensitive && typeof value === 'string') {
-        const escaped = value.replace(/[%_]/g, '\\$&');
+        const escaped = value
+          .replace(/\\/g, '\\\\')
+          .replace(/[%_]/g, '\\$&');
         filtered = filtered.ilike(dbField, escaped);
       } else {
         filtered = filtered.eq(dbField, value);
EOF
@@ -381,7 +381,9 @@
const dbField = this.#toDbField(key);
const attr = this.schema.getAttribute(key);
if (attr?.caseInsensitive && typeof value === 'string') {
const escaped = value.replace(/[%_]/g, '\\$&');
const escaped = value
.replace(/\\/g, '\\\\')
.replace(/[%_]/g, '\\$&');
filtered = filtered.ilike(dbField, escaped);
} else {
filtered = filtered.eq(dbField, value);
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
@ekremney
Copy link
Copy Markdown
Member

ekremney commented Mar 1, 2026

@solaris007 Nice catch on the case-sensitivity regression. The fix works, but I'm concerned about the ILIKE approach at scale — it can't use a regular btree index, so it's a sequential scan. With ~5,500 sites that's fine today, but if caseInsensitive gets added to a larger table (e.g. scrape_urls with 3M+ rows), it becomes a problem.

A simpler alternative is to handle this at the database level with CITEXT:

CREATE EXTENSION IF NOT EXISTS citext;
ALTER TABLE sites ALTER COLUMN base_url TYPE citext;

This makes the = operator case-insensitive natively, so:

  • Regular btree indexes still work (no seq scan)
  • PostgREST .eq() queries become case-insensitive automatically
  • No application code changes needed — no caseInsensitive flag, no ILIKE, no wildcard escaping
  • Works for all queries on that column, not just the ones going through #applyKeyFilters

I can create the migration on the data-service side. What do you think?

@solaris007
Copy link
Copy Markdown
Member Author

closing as per #1388 (comment)

@solaris007 solaris007 closed this Mar 1, 2026
@solaris007 solaris007 deleted the fix/case-insensitive-key-filters branch March 1, 2026 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants