-
Notifications
You must be signed in to change notification settings - Fork 0
Query Execution

- Query Parsing: Pinot supports a slightly modified version of SQL which we refer to as PQL. PQL only supports a subset of SQL for example Pinot does not support Joins, nested sub queries etc. We use Antlr to parse the query into a parse tree. In this phase, all syntax validations are performed and default values are set for missing elements.
- Logical Plan Phase: This phase takes in query parse tree and outputs a Logical Plan Tree. This phase is single threaded and is simple and constructs the appropriate logical plan operator tree based on the query type (selection, aggregation, group by etc) and metadata provided by the data source.
- Physical Plan Phase: This phase further optimizes the plan based on individual segment. The optimization applied in this phase can be different across various segments.
- Executor Service: Once we have per segment physical operator tree, executor service takes up the responsibility of scheduling the query processing tasks on each and every segment.

Below describes the query plan maker and gives three workflows of different query plans.
Query plan maker will make the query plan based on the query and data segment.
Based on the query type, we create three main query plans for aggregation query, aggregation group by query and selection query.
Aggregation Plan Node will take a list of AggregationFunctionPlanNode.

AggregationGroupByPlan Node will take a list of AggregationFunctionGroupByPlanNode.


InstanceRequest contains a list of segments(from routing table in broker) to query.
Query Executor will get those segments from InstanceDataManager.
Different segment pruners are applied based on segment metadata.
For example,
TableSegmentPruner will prune segments not matching table name .
TimeRangeSegmentPruner will prune segment based on the time range specified in segment metadata level.
Plan maker will take request and apply it to all the segments passing segment pruners.
Plan Executor will take the query plan and run it, then return InstanceResponse.
Inter segment query plan is used to represent how to process query crossing multiple segments.
Below is an example of an Inter-Segment query plan.
Inner-segment query plan is for how to query inside one segment.
Based on the query type, we have three different Operator type:
MAggregationOperator, MSelectionOperator and MAggregationGroupByOperator.
Each will take care of how to run the given query on the data sources inside segment.
For operators, we use prefix to identify the parameters it will take. 'U' is for only one parameter, 'B' is for two parameters, 'M' is for three or more or a list of parameters.
UResultOperator is always the enter point of an instance request. It will take a MCombineOperator as input and return InstanceResponse to Broker.
Call nextBlock() to get the InstanceResponse.
MCombineOperator will take multiple IntermediateResults as input and merge them together.
Call nextBlock() to get an already merged IntermediateResultBlock.
BDocIdSetOperator takes IndexSegment and FilterOperator and docId buffer size as input.
Call nextBlock() to get a new DocIdSetBlock with at most buffer size docIds.
DocIdSetBlock contains a list of docIds. All the rows along with those docIds in given segment match the filter criteria.
MProjectionOperator will take one DocIdSetOperator and multiple DataSources as input.
Call nextBlock() to get ProjectionBlock.
As MProjectionOperator is reused across multiple computation operators, ProjectionBlock provides APIs for aggregation and selection operators to get needed data blocks:
Block getBlock(String column);
Block getDocIdSetBlock();
Take ProjectionOperator and a list of Aggregation Function Operators. Take ProjectionOperator and a list of Aggregation Function Operators. Call nextBlock() to kick off each AggregationFunctionOperator's nextBlock() and then merge to existed aggregation results.
Take MProjectionOperator and an Aggregation Function Info as Input. Call nextBlock() to iterator on the block and do iterator and return an aggregated result.
Take Selection query and MProjectionOperator as input. Call nextBlock() to take matched docIds and collect rows into a collection of Serializable[]. Based on the query, we only scan limit number of documents, then return.
Take Selection query and MProjectionOperator as input. Call nextBlock() to iterate on each data block, collect matched docIds and put them into a priority queue of Serializable[] to maintain the top X events based on ordering.
Take aggregation function info, group by info, and MProjectionOperator as input.
Call nextBlock() to get a block of data sources and apply aggregation group by query on those blocks.
Each nextBlock() will collect all the matched docIds and do aggregation per doc. GroupKey is also created when doing aggregation.
For dictionary based operator, group key is created to fit into a long variable based on the size of bits used for each column. Sum of bits used for each group key column should smaller than 64.
If group key is not fit for a long value or the group column has no dictionary, we will construct string group key.
- Multitenancy
- Architecture
- Query Execution
- [Pinot Core Concepts and Terminology] (Pinot-Core-Concepts-and-Terminology)
- [Low level kafka Consumers] (Low-level-kafka-consumers)
- [Expressions & UDF Support] (Expressions-&-UDF-support)