Currently, Druid assumes that all dimensions have string values and are associated with a bitmap index.
There has been interest in loosening these constraints to support use cases that blur the existing separation between dimensions and metrics, e.g., filtering on numeric columns, aggregating dimensions at query time.
A recent discussion on these topics can be found here:
https://groups.google.com/d/msg/druid-user/Mk6omlC6Vbk/jtIFGFrACwAJ
This proposal was initially sent out on the druid-dev list, and initial comments can be found there:
https://groups.google.com/d/topic/druid-development/obtfNJnXPDg/discussion
This proposal calls for two major changes/features:
1.) Remove the assumption that dimensions always have string values.
This change is a path towards reducing the distinction between dimensions and metrics.
This would involve changes to:
- IncrementalIndex, IndexMerger, etc. (ingestion)
- StorageAdapters, query engines, etc. (querying)
- Ingestion specs, allow user to specify dimension types (e.g., String, Long, Float)
2.) Allow user to choose per-column index strategies
Druid could support a wider range of index types beyond bitmaps. Giving users control over what indexes are used on a per-column basis could make Druid more powerful and efficient.
For example, if a dimension is expected to have high cardinality and range filters applied to it, the user may want to choose a tree-based index instead of bitmaps.
As another example, trie indexes could be used to better support text search on dimension values.
The existing ColumnCapabilities class could be used to describe what indexes are supported for a column.
This would involve changes to:
- query-related components, allow them to handle columns that do not use bitmap indexes
- on-disk storage format, to store new index types with the columns
- ingestion specs
EDIT, Mar. 15 2016:
The following patch has been merged, it prepares IncrementalIndex for later typing-related changes:
#2263
The following PRs are currently open, and should be reviewed/merged in order:
druid-io/druid-api#75 - Adds DimensionSchema class for specifying dimension type, properties
#2607 - Updates druid main to use DimensionSchema
#2621 - larger PR that adds support for Long/Float typed dims
Currently, Druid assumes that all dimensions have string values and are associated with a bitmap index.
There has been interest in loosening these constraints to support use cases that blur the existing separation between dimensions and metrics, e.g., filtering on numeric columns, aggregating dimensions at query time.
A recent discussion on these topics can be found here:
https://groups.google.com/d/msg/druid-user/Mk6omlC6Vbk/jtIFGFrACwAJ
This proposal was initially sent out on the druid-dev list, and initial comments can be found there:
https://groups.google.com/d/topic/druid-development/obtfNJnXPDg/discussion
This proposal calls for two major changes/features:
1.) Remove the assumption that dimensions always have string values.
This change is a path towards reducing the distinction between dimensions and metrics.
This would involve changes to:
2.) Allow user to choose per-column index strategies
Druid could support a wider range of index types beyond bitmaps. Giving users control over what indexes are used on a per-column basis could make Druid more powerful and efficient.
For example, if a dimension is expected to have high cardinality and range filters applied to it, the user may want to choose a tree-based index instead of bitmaps.
As another example, trie indexes could be used to better support text search on dimension values.
The existing ColumnCapabilities class could be used to describe what indexes are supported for a column.
This would involve changes to:
EDIT, Mar. 15 2016:
The following patch has been merged, it prepares IncrementalIndex for later typing-related changes:
#2263
The following PRs are currently open, and should be reviewed/merged in order:
druid-io/druid-api#75 - Adds DimensionSchema class for specifying dimension type, properties
#2607 - Updates druid main to use DimensionSchema
#2621 - larger PR that adds support for Long/Float typed dims