Ref: http://druid.io/docs/latest/querying/dimensionspecs.html
The current Regular Expression Extraction Function is almost useful but it needs the extra trimmings that can be found on the lookup extraction function. Specifically the fact that it "If there is no match, it returns the dimension value as is." is not useful. Ideally I want it to send anything that does not match the regexp to "null".
I believe that this functionality can be achieved (without breaking backwards compatibility) by adding the "retainMissingValue", "injective", and "replaceMissingValueWith" properties that can be found on the lookup (retainMissingValue should be true by default to preserve backwards computability).
This is my use case:
Say I have a dimensions of files the were downloaded from my web server:
Files:
index.html
the_end_is_near_2.html
kafka-0.6.2.tar.gz
kafka-0.6.1.tar.gz
kafka-0.5.9.tar.gz
I would like to extract the version number (make a derived dimension) at query time.
I want to run this regexp: (\d+\.\d+\.\d+) and I want index.html and the_end_is_near_2.html to be transformed to null (not kept as is).
Ref: http://druid.io/docs/latest/querying/dimensionspecs.html
The current Regular Expression Extraction Function is almost useful but it needs the extra trimmings that can be found on the lookup extraction function. Specifically the fact that it "If there is no match, it returns the dimension value as is." is not useful. Ideally I want it to send anything that does not match the regexp to "null".
I believe that this functionality can be achieved (without breaking backwards compatibility) by adding the
"retainMissingValue","injective", and"replaceMissingValueWith"properties that can be found on the lookup (retainMissingValueshould be true by default to preserve backwards computability).This is my use case:
Say I have a dimensions of files the were downloaded from my web server:
Files:
index.htmlthe_end_is_near_2.htmlkafka-0.6.2.tar.gzkafka-0.6.1.tar.gzkafka-0.5.9.tar.gzI would like to extract the version number (make a derived dimension) at query time.
I want to run this regexp:
(\d+\.\d+\.\d+)and I wantindex.htmlandthe_end_is_near_2.htmlto be transformed to null (not kept as is).