Skip to content

The "regex" extractionFn needs a "retainMissingValue" to be useful #2064

@vogievetsky

Description

@vogievetsky

Ref: http://druid.io/docs/latest/querying/dimensionspecs.html

The current Regular Expression Extraction Function is almost useful but it needs the extra trimmings that can be found on the lookup extraction function. Specifically the fact that it "If there is no match, it returns the dimension value as is." is not useful. Ideally I want it to send anything that does not match the regexp to "null".

I believe that this functionality can be achieved (without breaking backwards compatibility) by adding the "retainMissingValue", "injective", and "replaceMissingValueWith" properties that can be found on the lookup (retainMissingValue should be true by default to preserve backwards computability).

This is my use case:

Say I have a dimensions of files the were downloaded from my web server:

Files:

  • index.html
  • the_end_is_near_2.html
  • kafka-0.6.2.tar.gz
  • kafka-0.6.1.tar.gz
  • kafka-0.5.9.tar.gz

I would like to extract the version number (make a derived dimension) at query time.
I want to run this regexp: (\d+\.\d+\.\d+) and I want index.html and the_end_is_near_2.html to be transformed to null (not kept as is).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions