Skip to content

SelectorDimFilter optimize() does not work correctly with LookupExtractionFn missing value handling #2775

@jon-wei

Description

@jon-wei

If optimize() is called on a ExtractionDimFilter/SelectorDimFilter with a LookupExtractionFn, an incorrect filter will be returned in some cases.

Suppose we have a single dimension, dimA, with rows:

{dimA = "a"}
{dimA = "b"}
{dimA = "c"}
{dimA = "d"}.

Suppose we define a LookupExtractionFn with the following underlying map, with retainMissingValues set to true:

{"a" -> "d"}.

If we define a selector/extraction filter that matches on "d" using the LookupExtractionFn above and call optimize() on the filter, the unapply() reverse-lookup will only pick up value "a". The optimize() step has no knowledge of the untransformed value "d", and the resulting InFilter will not match all the rows it needs to.


Similarly, if retainMissingValues is false, and replaceMissingValuesWith has the same value as the selector value, optimize() will not be aware of all the row values that it needs.

Using the same example rows, suppose we have the following selector filter and lookup extraction:

selector: 
- value: "b"
lookup:
- map: "a" -> "b"
- replaceMissingValuesWith = "b"

This should match all rows, since "a" will be mapped to "b", and "b","c","d" are not in the lookup map and will be transformed to "b".

If we call optimize() on this filter, the resulting filter built from reverse-lookup will only select for value "a". It is not aware of all the other values that will be transformed to "b" via the replaceMissingValuesWith property.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions