Skip to content

[Proposal] More flexible dimension types #2292

@jon-wei

Description

@jon-wei

Currently, Druid assumes that all dimensions have string values and are associated with a bitmap index.

There has been interest in loosening these constraints to support use cases that blur the existing separation between dimensions and metrics, e.g., filtering on numeric columns, aggregating dimensions at query time.

A recent discussion on these topics can be found here:
https://groups.google.com/d/msg/druid-user/Mk6omlC6Vbk/jtIFGFrACwAJ

This proposal was initially sent out on the druid-dev list, and initial comments can be found there:
https://groups.google.com/d/topic/druid-development/obtfNJnXPDg/discussion

This proposal calls for two major changes/features:


1.) Remove the assumption that dimensions always have string values.

This change is a path towards reducing the distinction between dimensions and metrics.

This would involve changes to:

  • IncrementalIndex, IndexMerger, etc. (ingestion)
  • StorageAdapters, query engines, etc. (querying)
  • Ingestion specs, allow user to specify dimension types (e.g., String, Long, Float)

2.) Allow user to choose per-column index strategies

Druid could support a wider range of index types beyond bitmaps. Giving users control over what indexes are used on a per-column basis could make Druid more powerful and efficient.

For example, if a dimension is expected to have high cardinality and range filters applied to it, the user may want to choose a tree-based index instead of bitmaps.

As another example, trie indexes could be used to better support text search on dimension values.

The existing ColumnCapabilities class could be used to describe what indexes are supported for a column.

This would involve changes to:

  • query-related components, allow them to handle columns that do not use bitmap indexes
  • on-disk storage format, to store new index types with the columns
  • ingestion specs

EDIT, Mar. 15 2016:

The following patch has been merged, it prepares IncrementalIndex for later typing-related changes:
#2263

The following PRs are currently open, and should be reviewed/merged in order:
druid-io/druid-api#75 - Adds DimensionSchema class for specifying dimension type, properties
#2607 - Updates druid main to use DimensionSchema
#2621 - larger PR that adds support for Long/Float typed dims

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions