[Proposal] The 2nd version of implementation for materialised view

The motivation of 'materialised view' is in issue #5218 .
The implementation of 'Materialised View' is diveded into two parts. One is **the management of DerivedDatasource**, and the other is **optimizing query with DerivedDatasource**.

# Management of DerivedDatasource
Each derived-datasource is managed by a derived-datasource supervisor(like kafkaSupervisor). 

### CREATE
DerivedDatasource is created when user submits a derived-datasource supervisor. The json file of derived-datasource supervisor should include base datasource name, dimensions and metrics at least. DerivedDatasource name can be setted by users or generated by base datasource name, dimensions and metrics. Segment granularity and query granularity should keep the same with base datasource. The information is stored as metadata of supervisors.

### MAINTAIN
A derived-datasource supervisor should make sure timeline and segment versions are the same with its base datasource (It can sovle the problem about segment version management @himanshug pointed out):

1. When timeline of derived-datasource is less than base-datasource, supervisor will submit a derived-datasource-index-task.  The derived-datasource-index-task is a hadoop-index-task. The only difference is that the version of segments generated by derived-datasource-index-task is not the time when task is submitted, but the related base datasource segments version.
2. If the timeline of derived-datasource is more than base-datasource, supervisor will set used=false and submit kill tasks for those segments.
3. Once supervisor find the segment version of derived-datasource is different from its related base-datasource in the same interval, supervisor will set used=false and submit a derived-datasource-index-task for that interval (This idea comes from @jihoonson 's suggestions).

### DELETE
When the supervisor is shutdown or resubmited, the previous data of the derived-datasource will be deleted.
When the base datasource is disable, all its derived datasource supervisor will be shutdown.


# Optimize query with DerivedDatasource
A MaterialisedViewQueryRunner is added in method applyPreMergeDecoration() of FluentQueryRunnerBuilder, such as
```
    public FluentQueryRunner applyPreMergeDecoration()
    {
      return from(
          new UnionQueryRunner<T>(
              new MaterialisedViewQueryRunner(
                  toolChest.preMergeQueryDecoration(
                      baseRunner
                  )
              )
          )
      );
    }
```
In MaterialisedViewQueryRunner, query is rewritten into union queries of derived-datasource and base datasource and merged results of all queries before returnning. The process of optimization is as follows.

1. Check if the datasource is a table datasource. If not, do not optimize. 
2. Check if the datasource has derived-datasources, and find the derived-datasources which involve the dimensions and metrics the query need. If no derived datasource meet the condition, do not optimize.
3. Split query interval by segment granularity into sub-intervals, and for each sub-interval, find out the derived-datasource which has the minimum amount of data. Then, replace the query datasource by the derived-datasource in the sub-interval.

In this way, query can be partially covered by the derived datasource and partially by the original one, and the problem @nishantmonu51 pointed out is solved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] The 2nd version of implementation for materialised view #5304

Management of DerivedDatasource

CREATE

MAINTAIN

DELETE

Optimize query with DerivedDatasource

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Proposal] The 2nd version of implementation for materialised view #5304

Description

Management of DerivedDatasource

CREATE

MAINTAIN

DELETE

Optimize query with DerivedDatasource

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions