Skip to content

Add MV_FILTER_REGEX and MV_FILTER_PREFIX SQL functions#18281

Merged
jtuglu1 merged 11 commits intoapache:masterfrom
jtuglu1:add-mv-regex-sql-functions
Jul 24, 2025
Merged

Add MV_FILTER_REGEX and MV_FILTER_PREFIX SQL functions#18281
jtuglu1 merged 11 commits intoapache:masterfrom
jtuglu1:add-mv-regex-sql-functions

Conversation

@jtuglu1
Copy link
Copy Markdown
Contributor

@jtuglu1 jtuglu1 commented Jul 18, 2025

Fixes #12911.

Description

  • Adds MV_FILTER_REGEX and MV_FILTER_PREFIX SQL functions.
  • Creates 2 new VirtualColumn implementations: RegexFilteredVirtualColumn and PrefixFilteredVirtualColumn.
  • Updates RegexpLikeExprMacro to support dynamic pattern literals (similar to how RegexpReplaceExprMacro does).

Examples

For following datasource:

Screenshot 2025-07-23 at 10 13 45 PM
-- returns rows: null, ["apple", "apple2", "apricot"], null, null
SELECT MV_FILTER_PREFIX("items", 'a')
from "mvs"

-- returns rows: null, "banana", "blueberry", null
SELECT MV_FILTER_PREFIX("items", 'b')
from "mvs"

-- returns rows: null, ["apple", "apple2", "apricot"], null, null
SELECT MV_FILTER_REGEX("items", 'a.*')
from "mvs"

-- returns rows: null, "banana", null, null
SELECT MV_FILTER_REGEX("items", '.*anana')
from "mvs"

-- returns rows: null, ["apple", "apple2"], null, null
SELECT MV_FILTER_PREFIX("items", 'apple')
from "mvs"

-- returns rows: null, ["apple", "apple2", "apricot", "banana"], ["blueberry", "chocolate"], "grape"
SELECT MV_FILTER_PREFIX("items", '')
from "mvs"

-- returns rows: null, ["apple", "apple2", "apricot", "banana"], ["blueberry", "chocolate"], "grape"
SELECT MV_FILTER_REGEX("items", '.*')
from "mvs"

-- returns rows: null, null, null, null
SELECT MV_FILTER_PREFIX("items", 'test')
from "mvs"

-- returns rows: null, null, null, null
SELECT MV_FILTER_REGEX("items", '')
from "mvs"

Release note

Add MV_FILTER_REGEX and MV_FILTER_PREFIX SQL functions


Key changed/added classes in this PR
  • MyFoo
  • OurBar
  • TheirBaz

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@jtuglu1 jtuglu1 changed the title Add MV_FILTER_REGEX SQL function Add MV_FILTER_REGEX and MV_FILTER_PREFIX SQL functions Jul 18, 2025
/**
* Expr when pattern is a literal.
*/
class RegexpLikeExpr extends BaseRegexpLikeExpr

Check notice

Code scanning / CodeQL

Inner class could be static Note

RegexpLikeExpr could be made static, since the enclosing instance is used only in its constructor.
@jtuglu1 jtuglu1 marked this pull request as ready for review July 18, 2025 16:57
@jtuglu1 jtuglu1 requested a review from clintropolis July 18, 2025 20:56
@jtuglu1
Copy link
Copy Markdown
Contributor Author

jtuglu1 commented Jul 18, 2025

@clintropolis I may still cleanup the RegexpLikeMacro logic a bit to share more validations/checks, but LMK what you think otherwise.

@jtuglu1 jtuglu1 requested a review from gianm July 18, 2025 21:25
@jtuglu1 jtuglu1 force-pushed the add-mv-regex-sql-functions branch from cd4b87d to 2571eb2 Compare July 21, 2025 18:18
Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, nice to have these other DimensionSpec migrated over to virtual columns. Someday I would like do deprecate DimensionSpec other than DefaultDimensionSpec and do all of the others as virtual columns, so this aligns well with that.

The virtual columns are missing an implementation of getIndexSupplier, so using them for filtering directly is probably sub-optimal, however that is also probably ok because i think the primary use of these functions are so you can tidy up an MVD and not see seemingly unrelated values when using a filter on the column itself due to how filters behave on mvds (if any value in the row matches the whole row matches, and when using grouping the implicit unnest results in the other non-matching row values appearing in results). So like as long as people are just using the MV_ functions in the select clause and just using the column itself and the regular filter in the where clause it shouldn't be a big deal.

Comment thread processing/src/main/java/org/apache/druid/segment/VirtualColumn.java Outdated
@jtuglu1 jtuglu1 requested review from kfaraz and maytasm July 24, 2025 05:09
@jtuglu1 jtuglu1 merged commit e7cba46 into apache:master Jul 24, 2025
77 checks passed
@jtuglu1 jtuglu1 deleted the add-mv-regex-sql-functions branch August 22, 2025 05:07
@cecemei cecemei added this to the 35.0.0 milestone Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Why druid-sql not support Filtered DimensionSpecs

5 participants