Skip to content
Mayank Shrivastava edited this page Jul 28, 2015 · 6 revisions

High Level Overview

Query processing phases

![](image2015-5-19 1-47-44.png)

  • Query Parsing: Pinot supports a slightly modified version of SQL which we refer to as PQL. PQL only supports a subset of SQL for example Pinot does not support Joins, nested sub queries etc. We use Antlr to parse the query into a parse tree. In this phase, all syntax validations are performed and default values are set for missing elements.
  • Logical Plan Phase: This phase takes in query parse tree and outputs a Logical Plan Tree. This phase is single threaded and is simple and constructs the appropriate logical plan operator tree based on the query type (selection, aggregation, group by etc) and metadata provided by the data source.
  • Physical Plan Phase: This phase further optimizes the plan based on individual segment. The optimization applied in this phase can be different across various segments.
  • Executor Service: Once we have per segment physical operator tree, executor service takes up the responsibility of scheduling the query processing tasks on each and every segment.

![](image2015-5-19 1-59-51.png)

Query Plan

Below describes the query plan maker and gives three workflows of different query plans.

Query plan maker will make the query plan based on the query and data segment.

Based on the query type, we create three main query plans for aggregation query, aggregation group by query and selection query.

Aggregation Query

Aggregation Plan Node will take a list of AggregationFunctionPlanNode.

Aggregation Group By Query

AggregationGroupByPlan Node will take a list of AggregationFunctionGroupByPlanNode.

Selection Query

Low Level Details

Segment Injection

InstanceRequest contains a list of segments(from routing table in broker) to query.

Query Executor will get those segments from InstanceDataManager.

Segment Pruner

Different segment pruners are applied based on segment metadata.

For example,

TableSegmentPruner will prune segments not matching table name .

TimeRangeSegmentPruner will prune segment based on the time range specified in segment metadata level.

Query Plan Maker

Plan maker will take request and apply it to all the segments passing segment pruners.

Plan Executor

Plan Executor will take the query plan and run it, then return InstanceResponse.

Inter-Segment Query Plan

Inter segment query plan is used to represent how to process query crossing multiple segments.

Below is an example of an Inter-Segment query plan.

<script type="syntaxhighlighter" class="theme: Confluence; brush: java; gutter: false"></script>

Inner-Segment Query Plan

Inner-segment query plan is for how to query inside one segment.

Based on the query type, we have three different Operator type:

MAggregationOperator, MSelectionOperator and MAggregationGroupByOperator.

Each will take care of how to run the given query on the data sources inside segment.

Operators

For operators, we use prefix to identify the parameters it will take. 'U' is for only one parameter, 'B' is for two parameters, 'M' is for three or more or a list of parameters.

Inter-Segments Operators

UResultOperator

UResultOperator is always the enter point of an instance request. It will take a MCombineOperator as input and return InstanceResponse to Broker.

Call nextBlock() to get the InstanceResponse.

MCombineOperator

MCombineOperator will take multiple IntermediateResults as input and merge them together.

Call nextBlock() to get an already merged IntermediateResultBlock.

Inner-Segment Operators:

FilterOperator
FilterOperator will take IndexSegment and brokerRequest as input and apply filter query on the segment.
Call nextBlock() to get a filtered docIdSet block that contains API to get a BlockDocIdSet.
DocIdSet block will be backed by bitmap or array.
Based on the filter query, a tree structure will be constructed. All the non-leaf nodes will be *AndOperator or *Operator. All the leaf nodes will be predicate operator.
BDocIdSetOperator

BDocIdSetOperator takes IndexSegment and FilterOperator and docId buffer size as input.

Call nextBlock() to get a new DocIdSetBlock with at most buffer size docIds.

DocIdSetBlock contains a list of docIds. All the rows along with those docIds in given segment match the filter criteria.

MProjectionOperator

MProjectionOperator will take one DocIdSetOperator and multiple DataSources as input.

Call nextBlock() to get ProjectionBlock.

As MProjectionOperator is reused across multiple computation operators, ProjectionBlock provides APIs for aggregation and selection operators to get needed data blocks:

    Block getBlock(String column);
    Block getDocIdSetBlock();
MAggregationOperator

Take ProjectionOperator and a list of Aggregation Function Operators. Take ProjectionOperator and a list of Aggregation Function Operators. Call nextBlock() to kick off each AggregationFunctionOperator's nextBlock() and then merge to existed aggregation results.

BAggregationFunctionOperator

Take MProjectionOperator and an Aggregation Function Info as Input. Call nextBlock() to iterator on the block and do iterator and return an aggregated result.

MSelectionOnlyOperator

Take Selection query and MProjectionOperator as input. Call nextBlock() to take matched docIds and collect rows into a collection of Serializable[]. Based on the query, we only scan limit number of documents, then return.

MSelectionOrderByOperator

Take Selection query and MProjectionOperator as input. Call nextBlock() to iterate on each data block, collect matched docIds and put them into a priority queue of Serializable[] to maintain the top X events based on ordering.

MAggregationGroupByOperator
Take ProjectionOperator, GroupBy information and a list of Aggregation Function Operators.
Call nextBlock() to kick off each AggregationFunctionGroupByOperator's nextBlock().
MAggregationFunctionGroupByOperator

Take aggregation function info, group by info, and MProjectionOperator as input.

Call nextBlock() to get a block of data sources and apply aggregation group by query on those blocks.

Each nextBlock() will collect all the matched docIds and do aggregation per doc. GroupKey is also created when doing aggregation.

For dictionary based operator, group key is created to fit into a long variable based on the size of bits used for each column. Sum of bits used for each group key column should smaller than 64.

If group key is not fit for a long value or the group column has no dictionary, we will construct string group key.

Home

Pinot Documentation

Pinot Administration

Contributor

Design Docs

  • Multitenancy
  • Architecture
  • Query Execution
  • [Pinot Core Concepts and Terminology] (Pinot-Core-Concepts-and-Terminology)
  • [Low level kafka Consumers] (Low-level-kafka-consumers)
  • [Expressions & UDF Support] (Expressions-&-UDF-support)

Clone this wiki locally