Add GlueingPartitioningOperator + Corresponding changes in window function layer to consume it for MSQ#17038
Conversation
…ction layer to consume it for MSQ
kgyrtkirk
left a comment
There was a problem hiding this comment.
thank you for the PR @Akshat-Jain!
…perator pipeline without any extra logic
|
|
||
| private boolean needToProcessBatch() | ||
| { | ||
| return numRowsInFrameRowsAndCols >= maxRowsMaterialized / 2; // Can this be improved further? |
There was a problem hiding this comment.
why divide by 2 ? that doesn't give any guarantee that it will be inside bounds
people could set it to half if needed - but I think its easier to document clear things...
There was a problem hiding this comment.
We need some threshold to start pushing RACs into the operator pipeline.
We discussed that we should push N rowed RACs into the pipeline. But it's not trivial to create RACs of exact size N.
So I felt it would be better to just have a threshold like push RACs into the pipeline when they cross N rows. I chose N = maxRowsMaterialized / 2 but we can always discuss on better values for this.
that doesn't give any guarantee that it will be inside bounds
I think it does 🤔
convertRowFrameToRowsAndColumns() method enforces the maxRowsMaterialized constraint: ensureMaxRowsInAWindowConstraint(frameRowsAndCols.size() + ldrc.numRows()), hence it won't allow us to accumulate more than maxRowsMaterialized rows.
Thoughts? Let me know if I'm missing something. Thanks!
| stageRowSignature = finalWindowStageRowSignature; | ||
| nextShuffleSpec = finalWindowStageShuffleSpec; | ||
| } else { | ||
| nextShuffleSpec = findShuffleSpecForNextWindow(operatorList.get(i + 1), maxWorkerCount); |
There was a problem hiding this comment.
please open a separate PR and fix the stagebuilder rather than hacking it backwards from here
|
|
||
| protected abstract Iterator<RowsAndColumns> getIteratorForRAC(RowsAndColumns rac); | ||
|
|
||
| protected abstract void handleKeepItGoing(AtomicReference<Signal> signalRef, Iterator<RowsAndColumns> iterator, Receiver receiver); |
There was a problem hiding this comment.
I don't really see how these methods make the implementing class's simpler...they don't really hide away much complexity...
|
|
||
| Iterator<RowsAndColumns> partitionsIter = getIteratorForRAC(rac); | ||
|
|
||
| AtomicReference<Signal> keepItGoing = new AtomicReference<>(Signal.GO); |
There was a problem hiding this comment.
why use an AtomicReference ; the handleKeepItGoing method is void why not use the return value?
There was a problem hiding this comment.
It forces us to return null in GlueingPartitioningOperator (as we need to return something):
@Override
protected Signal handleKeepItGoing(Iterator<RowsAndColumns> iterator, Receiver receiver)
{
RowsAndColumns rowsAndColumns = iterator.next();
if (iterator.hasNext()) {
return receiver.push(rowsAndColumns);
} else {
previousRac = rowsAndColumns;
return null;
}
}And then we have to handle the null specifically, as we don't want to update the signal in the case of null, since handlePush() needs to return the last non-null signal.
Signal keepItGoing = Signal.GO;
while (keepItGoing == Signal.GO && partitionsIter.hasNext()) {
Signal signal = handleKeepItGoing(partitionsIter, receiver);
if (signal != null) {
keepItGoing = signal;
}
}This didn't seem clean to me, so I ended up going with the AtomicRef approach.
There was a problem hiding this comment.
then it maybe return an Optional<Signal> ?
There was a problem hiding this comment.
This isn't relevant anymore as I had to move this logic into Glueing/NaivePartitioningOperator layer (away from Abstract layer).
I tried to keep handlePush() and handleKeepItGoing() in Abstract layer, but since we are using static classes now, we can't override the implementation of a static method, so I had to remove these methods and move the logic to the individual classes.
| @@ -110,28 +79,7 @@ public Closeable goOrContinue(Closeable continuation, Receiver receiver) | |||
| @Override | |||
| public Signal push(RowsAndColumns rac) | |||
There was a problem hiding this comment.
make the Receiver a static inner class; it should have the Iterator as its field.
handlePush naturally wants to be a method of it
- the full
if (cont.iter != null)part should also go into the abstract; - put the full body of the
whileinto a method in the abstract - the glueing should override that method and before calling the
super()it could check if there are more elements and save that and use that when theReceiveris built
There was a problem hiding this comment.
A static receiver wouldn't have access to methods like ensureMaxRowsMaterializedConstraint() which are also called from non-static methods.
Also, can you elaborate why should we make receiver static? Even if we had a StaticReceiver, we would still have to do new StaticReceiver() with the required state.
For example, to add a static receiver in NaivePartitioningOperator, I added the following StaticReceiver class and then replaced new Receiver() {} with new StaticReceiver(receiver, iterHolder):
static class StaticReceiver implements Receiver
{
private final Receiver delegate;
private final AtomicReference<Iterator<RowsAndColumns>> iterHolder;
public StaticReceiver(
Receiver delegate,
AtomicReference<Iterator<RowsAndColumns>> iterHolder
) {
this.delegate = delegate;
this.iterHolder = iterHolder;
}
@Override
public Signal push(RowsAndColumns rac)
{
return handlePush(rac, delegate, iterHolder);
}
@Override
public void completed()
{
if (iterHolder.get() == null) {
delegate.completed();
}
}
}I think we're not on the same page wrt how the static receiver needs to look like. Can you share your thoughts on the above?
There was a problem hiding this comment.
doesn't seem like a dealbreaker...as a last resort ensureMaxRowsMaterializedConstraint can be static and pass the 2 integers for it?
There was a problem hiding this comment.
@kgyrtkirk The iterator needs to be created in the receiver's push() method.
Having a static receiver doesn't let us create the iterator there, since getIteratorForRAC() is not static.
getIteratorForRAC() can't be made static since it's supposed to be an abstract method, with different implementations in NaivePartitioningOperator and GlueingPartitioningOperator.
If I remove getIteratorForRAC() from the abstract class, and just have it as a regular non-overridden static method, then how to pass the iterator to the static receiver? I can't pass it in like super.push(rac, iteratorForRac) since it violates the method signature of push().
Also, trying to make everything static also ends up requiring us to pass unnecessary variables as method params, and structure everything just to make the receiver static, which seems unnecessary and not clean to me. We anyway would still have to do new StaticReceiver(field1, field2,...) everytime instead of new Receiver().
Thoughts?
There was a problem hiding this comment.
Have made local changes to make receiver and the iterator class static. The code doesn't seem clean though, as I had to pass a bunch of params. Also trying to see what can be moved to Abstract class with those changes. (This is all on my local right now, just updating this info here)
There was a problem hiding this comment.
@kgyrtkirk Have pushed the change to the PR to make the relevant classes static. Appreciate if you could take a look again, thanks!
| public class GlueingPartitioningOperator extends AbstractPartitioningOperator | ||
| { | ||
| private final int maxRowsMaterialized; | ||
| private RowsAndColumns previousRac; |
There was a problem hiding this comment.
it would be better to not have this in Operator scope; I believe it should belong to the Receiver
There was a problem hiding this comment.
previousRac is also used in the continuation logic, so needs to be at the Operator scope.
There was a problem hiding this comment.
although it works - that's bad design
|
|
||
| Iterator<RowsAndColumns> partitionsIter = getIteratorForRAC(rac); | ||
|
|
||
| AtomicReference<Signal> keepItGoing = new AtomicReference<>(Signal.GO); |
There was a problem hiding this comment.
then it maybe return an Optional<Signal> ?
| throw new NoSuchElementException(); | ||
| } | ||
|
|
||
| if (!firstPartitionHandled) { |
There was a problem hiding this comment.
note: you don't necessarily need this boolean; couldn't (previousRac != null) act like it?
or currentIndex==0 ?
having (previousRac != null) at a higher level could also cleanup some conditionals
There was a problem hiding this comment.
Changed to using currentIndex == 0
There was a problem hiding this comment.
Have made some change in the way the conditionals are structured: b725332
Please let me know if this aligns with the suggestion.
| ) | ||
| { | ||
| super(partitionColumns, child); | ||
| this.maxRowsMaterialized = maxRowsMaterialized; |
There was a problem hiding this comment.
It's int, so it can't be null.
There was a problem hiding this comment.
note: unboxing happens at this point as this.maxRowsMaterialized is an int; meanwhile maxRowsMaterialized is an Integer
There was a problem hiding this comment.
Makes sense, have added Preconditions.checkNotNull(maxRowsMaterialized, "maxRowsMaterialized cannot be null");, hope that works.
| private RowsAndColumns makeSimpleRac(int... values) | ||
| { | ||
| return MapOfColumnsRowsAndColumns.fromMap( | ||
| ImmutableMap.of("column", new IntArrayColumn(values)) | ||
| ); | ||
| } | ||
|
|
||
| private RowsAndColumnsHelper expectedSimpleRac(int... values) | ||
| { | ||
| return new RowsAndColumnsHelper() | ||
| .expectColumn("column", values) | ||
| .allColumnsRegistered(); | ||
| } |
There was a problem hiding this comment.
note: this is not critical; but if you copy these 10 lines into multiple files that creates a maintenance burden after some iterations - its better to try to reuse such things somehow
There was a problem hiding this comment.
Moved these methods to RowsAndColumnsHelper class. Hope that works.
| ) | ||
| { | ||
| super(partitionColumns, child); | ||
| this.maxRowsMaterialized = maxRowsMaterialized; |
There was a problem hiding this comment.
note: unboxing happens at this point as this.maxRowsMaterialized is an int; meanwhile maxRowsMaterialized is an Integer
| ClusteredGroupPartitioner groupPartitioner = rac.as(ClusteredGroupPartitioner.class); | ||
| if (groupPartitioner == null) { | ||
| groupPartitioner = new DefaultClusteredGroupPartitioner(rac); | ||
| } |
There was a problem hiding this comment.
note: use ClusteredGroupPartitioner.fromRAC
There was a problem hiding this comment.
Oh nice, wasn't aware of this. Have made the change in both partitioning operators. Thanks for pointing this out!
|
|
||
| private final ArrayList<ResultRow> rowsToProcess; | ||
| private int lastPartitionIndex = -1; | ||
| private Operator op = null; |
There was a problem hiding this comment.
can this field have a more verbose name?
There was a problem hiding this comment.
Renamed to operator. Any other naming suggestion? 😅
| if (frameHasRowsPendingFlush()) { | ||
| return ReturnOrAwait.runAgain(); | ||
| catch (IOException e) { | ||
| throw new RuntimeException(e); |
There was a problem hiding this comment.
note: why the need to catch this?
at line 136 it was okay to throw - why not okay here?
There was a problem hiding this comment.
Ah right, have made the change.
| return null; // Signal that the operator has completed its work | ||
| } | ||
|
|
||
| // Return a non-null continuation object to indicate that we want to continue processing. |
There was a problem hiding this comment.
I don't think this comment and the returned Closeable lives up to the contracts associated with the implemented method...but that didn't worked before this change either.....
There was a problem hiding this comment.
Since we have to re-use the same operator, I needed a way to keep it running..
Description
Currently,
WindowOperatorQueryFrameProcessorwas using the following operator pipeline for window processing:(MSQ shuffling) -> NaiveSortOperator -> NaivePartitioningOperator -> WindowOperator.Because of this, we had the following issues:
PARTITION BYcolumns) in theWindowOperatorQueryFrameProcessorlayer. This is not good because:1. The partitioning logic would be done again in the
NaivePartitioningOperator, hence it's redundant.2. It's a lot of code, making it unnecessarily difficult to follow the logic.
NaiveSortOperatorwas a synchronization barrier. It needs to process all rows before sending data to the receiver.This PR introduces 2 new operators:
GlueingPartitioningOperator: It continuously receives data, and outputs batches of partitioned RACs. It maintains alast-partitioning-boundaryof thelast-pushed-RAC, and attempts to glue it with the next RAC it receives, ensuring that partitions are handled correctly, even across multiple RACs. You can checkGlueingPartitioningOperatorTestfor some good examples of the "glueing" work.PartitionSortOperator: It sorts rows inside partitioned RACs, on the sort columns. The input RACs it receives are expected to be "complete / separate" partitions of data.With this PR's changes, we are converting the operator pipeline from
into
This allows
WindowOperatorQueryFrameProcessorto send RACs of any number of rows into the operator pipeline, without having to do the partitioning on thePARTITION BYcolumns.Other notable changes done in the PR
WindowOperatorQueryKit1. Add translation from
NaiveSortOperator -> NaivePartitioningOperator -> WindowOperatorintoGlueingPartitioningOperator -> PartitionSortOperator -> WindowOperator.2. Changed logic of creation of window stages. Previously we were creating a single window stage if we had an empty
over()clause. But we can't do that now since we rely on MSQ shuffling to cluster the data on the partitioning keys for us, as we don't haveNaiveSortOperatorin the chain now.WindowOperatorQueryFrameProcessor1. Modified the operator so that we don't create the operator chain again and again for new RACs.
2. Modified the operator execution to not run it to completion, and control it manually, since we don't want the
completed()to be called.3. Logic changes to blindly send rows to the operator pipeline whenever
number of rowscrosses a threshold.Refactoring work: Created base abstract classes for the following pairs of classes:
1.
NaiveSortOperator/PartitionSortOperator2.
NaiveSortOperatorFactory/PartitionSortOperatorFactory3.
NaivePartitioningOperator/GlueingPartitioningOperator4.
NaivePartitioningOperatorFactory/GlueingPartitioningOperatorFactoryKey changed/added classes in this PR
GlueingPartitioningOperatorand corresponding factory classPartitionSortOperatorand corresponding factory classWindowOperatorQueryKitWindowOperatorQueryFrameProcessorThis PR has: