Skip to content

Allow filters to use extraction functions#2690

Merged
fjy merged 1 commit intoapache:masterfrom
jon-wei:filter_support
Apr 5, 2016
Merged

Allow filters to use extraction functions#2690
fjy merged 1 commit intoapache:masterfrom
jon-wei:filter_support

Conversation

@jon-wei
Copy link
Copy Markdown
Contributor

@jon-wei jon-wei commented Mar 19, 2016

This PR allows the user to define an extraction function on each of the filters (except SpatialFilter); the extraction function will be applied to the input before the filter criteria is applied.

This merges the SelectorFilter and ExtractionFilter into a single class (ExtractionDimFilter is deprecated in this patch, ExtractionFilter has been deleted).


Fixes #2643

Also fixes #2772, #2775, #2776

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Mar 21, 2016

updated docs

@jon-wei jon-wei added this to the 0.9.1 milestone Mar 21, 2016
@jon-wei jon-wei changed the title Allow all filters to use extraction functions, allow FilteredAggregators to use all filter types Allow filters to use extraction functions Mar 23, 2016
@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Mar 23, 2016

rebased, removed conflicts with #2711

@navis
Copy link
Copy Markdown
Contributor

navis commented Mar 24, 2016

I think JavaScriptFilter and SelectorFilter also can extend DimensionPredicateFilter, removing some duplicated codes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cause extraction itself is handled in predicate, I think we don't need this here (can be confusing).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navis thanks, extractionFn was unnecessary here, removed it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but now it's necessary again, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's being used now

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Mar 24, 2016

sorry @jon-wei, more conflicts!

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Mar 24, 2016

@navis

re: changing SelectorFilter and JavascriptFilter to implement DimensionPredicateFilter

There are some performance considerations if that change were made:

  • getBitmapIndex() for DimensionPredicateFilter requires a scan through the column's dictionary values to determine which values matched the predicate. For SelectorFilter without an extraction function, the current getBitmapIndex() is much simpler/faster since it can just call selector.getBitmapIndex(dimension, value).
  • For JavascriptFilter, the current implementation allows getBitmapIndex() to reuse the Context object across the scan of the dictionary. Using the JavascriptPredicate with DimensionPredicateFilter's getBitmapIndex() function would result in a new Context object being created on every apply() call. There's a note regarding this performance consideration at:
    https://github.com/druid-io/druid/blob/master/processing/src/main/java/io/druid/segment/filter/JavaScriptFilter.java#L92

I think, if that change is desirable, it's better handled in a separate PR.

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Mar 24, 2016

@gianm np, rebasing now

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Mar 24, 2016

rebased

@navis
Copy link
Copy Markdown
Contributor

navis commented Mar 24, 2016

@jon-wei Pair enough. 👍

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this constructor needed? If it's only used for tests then usually we modify the tests rather than add another constructor.

(similar comment for all the other filters)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO one nice thing about only having one constructor is that it makes it harder to accidentally create an object with a critical parameter missing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm removed extra constructors

@jon-wei jon-wei force-pushed the filter_support branch 2 times, most recently from 60a896a to a34c2f1 Compare April 1, 2016 23:11
@jon-wei jon-wei added the Bug label Apr 1, 2016
@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Apr 1, 2016

Addressed comments, rebasing now

This PR also fixes a few issues related to filtering, I've opened issues for them:
#2772
#2775
#2776

@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Apr 2, 2016

rebased

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not return new SelectorDimFilter(dimension, value, extractionFn).optimize();?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm good idea, I'll change that

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm changed to use SelectorDimFilter optimize()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be optimized in a similar way to the SelectorDimFilter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm Can we handle the InDimFilter optimization in a separate PR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, you can file an issue for anything you don't think is appropriate for this PR.

@jon-wei jon-wei mentioned this pull request Apr 5, 2016
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keys isn't guaranteed to be mutable, so it'd be better to copy it and add convertedValue to the copy.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me other than that

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Apr 5, 2016

lgtm 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DimensionPredicateFilter and JavaScriptFilter don't handle null columns properly Ability to use extractionFns in any filter

5 participants