[FLINK-11391] Introduce shuffle master interface #8362

azagrebin · 2019-05-07T16:11:55Z

What is the purpose of the change

This PR introduces ShuffleMaster interface. JobMaster and ExecutionGraph use ShuffleMaster to register resources for produced partition of intermediate results. This is part of the overall effort to make shuffle implementation pluggable.

Brief change log

Introduce ShuffleMaster interface
Introduce ProducerDescriptor to describe producer for ShuffleMaster
Introduce PartitionDescriptor to describe partition for ShuffleMaster
Introduce ShuffleDescriptor: partition handle produced by ShuffleMaster
to hand over to task and its local shuffle service
Introduce UnknownShuffleDescriptor for eager consumer scheduling
when producer is unknown yet
Register produced partitions when execution gets slot
Refactor TaskDeploymentDescriptor creation into a TaskDeploymentDescriptorFactory
Refactor ResultPartitionDeploymentDescriptor to consist of
PartitionDescriptor and ShuffleDeploymentDescriptor
Refactor InputGateDeploymentDescriptor and PartitionInfo to use ShuffleDescriptor
instead of InputChannelDeploymentDescriptor
Introduce NettyShuffleMaster and NettyShuffleDescriptor implementations
for existing shuffle service based on Netty communication and local files
Refactor SingleInputGate.create and updateInputChannel to determine partition location based on
consumer resource id and producer id from NettyShuffleDescriptor
Adjust tests

Verifying this change

unit tests

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
The S3 file system connector: (no)

Documentation

Does this pull request introduce a new feature? (no)
If yes, how is the feature documented? (not applicable)

flinkbot · 2019-05-07T16:15:09Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit f7014b7 (Wed Aug 07 16:30:56 UTC 2019)

Warnings:

No documentation files were touched! Remember to keep the Flink docs up to date!

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❗ 3. Needs [attention] from.
- Needs attention by @tillrohrmann [PMC], @zhijiangW
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

azagrebin · 2019-05-07T16:15:33Z

@flinkbot attention @tillrohrmann @zhijiangW

zhijiangW · 2019-05-10T08:16:18Z

flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/PartitionShuffleDescriptor.java

+	}
+
+	@Override
+	public String toString() {


I am not sure whether to add the left infos of numberOfSubpartitions and maxParallelism here.

zhijiangW · 2019-05-10T08:16:59Z

flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/PartitionShuffleDescriptor.java

+		return maxParallelism;
+	}
+
+	public int getConnectionIndex() {


Currently it could be only package private. I am not sure whether it could be used outside of package future.

it could be potentially used outside of this package in other shuffle implementation. I would leave methods public here because it is basically part of shuffle API then.

I would recommend following the conservative approach Zhijiang proposed and make things only public if they are really needed. Decreasing visibility is always much harder than increasing it.

zhijiangW · 2019-05-10T08:20:27Z

flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/PartitionShuffleDescriptor.java

+
+		return new PartitionShuffleDescriptor(
+			resultId, partitionId, partitionType, numberOfSubpartitions, maxParallelism,
+			partition.getIntermediateResult().getConnectionIndex());


I am not sure whether it is worth defining a private connectionIndex var in front for partition.getIntermediateResult().getConnectionIndex() to keep the same way as other vars.

zhijiangW · 2019-05-10T08:26:35Z

...e/src/main/java/org/apache/flink/runtime/deployment/ResultPartitionDeploymentDescriptor.java

+	}
+
+	@Nonnull
+	public ShuffleDeploymentDescriptor getShuffleDeploymentDescriptor() {


package private

zhijiangW · 2019-05-10T08:33:56Z

...ntime/src/main/java/org/apache/flink/runtime/shuffle/DefaultShuffleDeploymentDescriptor.java

+	private final ConnectionID producerConnection;
+
+	@Nonnull
+	private final ResultPartitionID resultPartitionID;


Maybe better to name resultPartitionId consistent with producerResourceId

zhijiangW · 2019-05-10T10:36:57Z

...ntime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptorFactory.java

+import java.util.List;
+import java.util.Map;
+
+/** Factory of {@link TaskDeploymentDescriptor} to deploy {@link Execution}. */


The annotation format /** */ seems mainly used in variable not in class. I am not very sure of it.

zhijiangW

Thanks for opening this PR and it seems really good @azagrebin .

I just left some minor inline comments and thought of two concerns currently. I have not finished the whole review and would continue on it.

The class PartitionInfo seems a bit redundant which might be replaced by InputGateDeploymentDescriptor(IGDD) completely. Because they cover most of the same informations, only ResultPartitionType and consumedSubpartitionIndex in IGDD seems not necessary for PartitionInfo now, but the ResultPartitionType might also make sense in PartitionInfo if we support the dynamically determine the type future. The current TaskExecutor#updatePartitions(IGDD) could also work well to make a bit change. And it seems also consistent to use the same structure during deploying and updating.
I am not sure it is a good way for using instanceof in SingleInputGate to check the instance of ShuffleDeploymentDescriptor and make the transformation. Another option is we define some methods in the interface ShuffleDeploymentDescriptor, such as boolean isUnknown(), getResultPartitionID(), getConnectionID() to avoid this.

azagrebin · 2019-05-14T13:26:59Z

Thanks for the review @zhijiangW ! I have addressed smaller comments.

True, PartitionInfo looks similar to IGDD. The difference is atm that PartitionInfo represents update of only one (but any) gate channel. IGDD has a consistent view of all gate channels. Not sure, how well it is semantically to treat IGDD as one channel update. I will think about it.
In general, I do not see a problem with instanceof or what is the concern here? If we add more methods like isUnknown(), all SDDs will have to implement them. Of course, we can make them inherit from a base class but it will clutter their code. Also channel's unknownness is more a general concept of scheduling in the absence of producer.

zhijiangW · 2019-05-15T03:37:57Z

Thanks for the confirmation @azagrebin .

Yes, I have not thought through the changes caused by single channel in PartitionInfo and all channels in IGDD. Just from the aspect of rpc call taskManagerGateway.updatePartitions(partitionInfos), the parameter is a collection of PartitionInfo which is the same as array of channels in IGDD. Maybe the IGDD should support cache ICDD internally and replace the array with collection. It might involve in more refactoring and I would also further consider it.
From functional aspect the current way is no problem. But I was ever suggested in my PR not using instanceof via introducing the interface method ChannelSelector#isBroadcast. Because instanceof sounds like a hacky, not a proper solution. I am not sure whether it is not suggested in common sense atm, or maybe it is just a personal preference. I think you could confirm this way with other guys. :)

BTW, I have not finished the whole review yet. I would continue on it later.

tillrohrmann

Thanks for opening this PR @azagrebin. I gave it a first pass mainly having minor comments (many style comments which do not affect correctness). I will give it another pass to look in detail at the abstractions. I will post comments regarding the abstractions next.

tillrohrmann · 2019-05-16T14:06:21Z

...runtime/src/main/java/org/apache/flink/runtime/deployment/InputGateDeploymentDescriptor.java

+		@Nonnull ResultPartitionType consumedPartitionType,
+		@Nonnegative int consumedSubpartitionIndex,
+		@Nonnull ShuffleDeploymentDescriptor[] inputChannels,
+		@Nonnull ResourceID consumerResourceId) {


Flink's code base uses double indentation to distinguish the parameters from the function body.

tillrohrmann · 2019-05-16T14:23:49Z

...ntime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptorFactory.java

+	}
+
+	public static TaskDeploymentDescriptorFactory fromExecutionVertex(
+		ExecutionVertex executionVertex, ExecutionAttemptID executionId, int attemptNumber) {


If we break the parameter list, then I would suggest to break all parameters.

tillrohrmann · 2019-05-16T14:24:40Z

...ntime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptorFactory.java

+		JobID jobID,
+		boolean lazyScheduling,
+		int subtaskIndex,
+		ExecutionEdge[][] inputEdges) {


It's not super consistent in the Flink code base but newer code tries to indent parameters lists an additional level to distinguish them from the method body.

tillrohrmann · 2019-05-16T14:37:05Z

...ntime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptorFactory.java

+		this.inputEdges = inputEdges;
+	}
+
+	public static TaskDeploymentDescriptorFactory fromExecutionVertex(


Usually static functions go to the bottom of the class. The order is roughly

Static fields; Field; Constructors; Methods; Static functions;

tillrohrmann · 2019-05-16T14:38:14Z

...ntime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptorFactory.java

+
+	/**
+	 * Creates a task deployment descriptor to deploy a subtask to the given target slot.
+	 */


These JavaDocs don't add much information.

tillrohrmann · 2019-05-20T13:14:57Z

...-runtime/src/test/java/org/apache/flink/runtime/taskexecutor/TaskExecutorSubmissionTest.java

+	}
+
+	private TaskDeploymentDescriptor createReceiver(
+		DefaultShuffleDeploymentDescriptor sdd, ResourceID location) throws IOException {


tillrohrmann · 2019-05-20T13:15:40Z

flink-runtime/src/test/java/org/apache/flink/runtime/util/ShuffleTestUtil.java

+
+		PartitionShuffleDescriptor psd = new PartitionShuffleDescriptor(
+			new IntermediateDataSetID(), sdd.getResultPartitionID().getPartitionId(),
+			ResultPartitionType.PIPELINED, 1, 1, 0);


tillrohrmann · 2019-05-20T13:15:54Z

flink-runtime/src/test/java/org/apache/flink/runtime/util/ShuffleTestUtil.java

+	}
+
+	public static InputGateDeploymentDescriptor createInputGateDeploymentDescriptor(
+		ShuffleDeploymentDescriptor sdd, ResourceID consumerLocation) {


line breaks

tillrohrmann · 2019-05-20T13:16:26Z

.../java/org/apache/flink/streaming/runtime/io/benchmark/StreamNetworkBenchmarkEnvironment.java

 				ResultPartitionType.PIPELINED_BOUNDED,
 				channel,
-				channelDescriptors);
+				channelDescriptors, localLocation);


tillrohrmann · 2019-05-20T13:17:50Z

.../java/org/apache/flink/streaming/runtime/io/benchmark/StreamNetworkBenchmarkEnvironment.java

 		}
 	}
+
+	private static ShuffleDeploymentDescriptor createLocalSdd(ResultPartitionID resultPartitionID, ResourceID location) {


Could ShuffleTestUtils#createSddWithLocalConnection be used here?

we can also use NettyShuffleDescriptorBuilder here.

tillrohrmann

I like the abstractions you've put in place @azagrebin. Good work! I've had a couple of minor comments.

tillrohrmann · 2019-05-20T13:42:41Z

...ntime/src/main/java/org/apache/flink/runtime/shuffle/DefaultShuffleDeploymentDescriptor.java

+/**
+ * Default implementation of {@link ShuffleDeploymentDescriptor} for {@link DefaultShuffleMaster}.
+ */
+public class DefaultShuffleDeploymentDescriptor implements ShuffleDeploymentDescriptor {


Serial version UID is missing. I recommend to activate IntelliJ's inspections which will mark this as an error.

tillrohrmann · 2019-05-20T13:50:48Z

flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/ProducerShuffleDescriptor.java

+/**
+ * Partition producer descriptor for {@link ShuffleMaster} to obtain {@link ShuffleDeploymentDescriptor}.
+ */
+public class ProducerShuffleDescriptor {


We could rename this class into ProducerDescriptor because it is already part of the shuffle package.

tillrohrmann · 2019-05-20T13:52:14Z

flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/PartitionShuffleDescriptor.java

+/**
+ * Partition descriptor for {@link ShuffleMaster} to obtain {@link ShuffleDeploymentDescriptor}.
+ */
+public class PartitionShuffleDescriptor implements Serializable {


We could do the same here with PartitionShuffleDescriptor --> PartitionDescriptor

tillrohrmann · 2019-05-20T13:55:26Z

flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/ShuffleMaster.java

+ * Intermediate result partition registry to use in {@link org.apache.flink.runtime.jobmaster.JobMaster}.
+ */
+public interface ShuffleMaster {
+	CompletableFuture<ShuffleDeploymentDescriptor> registerPartitionWithProducer(


Could think about introducing a T extends ShuffleDeploymentDescriptor and then letting the specific implementations define the T. This could avoid type casting when testing the implementations.

tillrohrmann · 2019-05-20T13:59:22Z

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/PartitionInfo.java

-		this.intermediateDataSetID = Preconditions.checkNotNull(intermediateResultPartitionID);
-		this.inputChannelDeploymentDescriptor = Preconditions.checkNotNull(inputChannelDeploymentDescriptor);
+	@Nonnull
+	private final ResourceID consumerResourceID;


I think this information should come from the TaskExecutor where it is used to update the channels.

tillrohrmann · 2019-05-20T14:11:12Z

flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/DefaultShuffleMaster.java

+/**
+ * Default {@link ShuffleMaster} for netty and local file based shuffle implementation.
+ */
+public class DefaultShuffleMaster implements ShuffleMaster {


Shall we name this class and all related classes directly NettyShuffleMaster?

tillrohrmann · 2019-05-20T14:24:19Z

...ntime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptorFactory.java

+		Execution producer = consumedPartition.getProducer().getCurrentExecutionAttempt();
+		Map<IntermediateResultPartitionID, ResultPartitionDeploymentDescriptor> producedPartitions =
+			producer.getProducedPartitions();
+		Preconditions.checkArgument(checkInputReady(consumedPartition.getPartitionId(), producedPartitions),


checkState would fit better here I think.

tillrohrmann · 2019-05-20T14:35:23Z

flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java


+	public Map<IntermediateResultPartitionID, ResultPartitionDeploymentDescriptor> getProducedPartitions() {
+		return producedPartitions;
+	}


We could hide this implementation detail by offering a Optional<ResultPartitionDeploymentDescriptor> getProducedPartition(IntermediateResultPartitionID intermediateResultPartitionId)

The return type could even be Optional<ShuffleDeploymentDescriptor> if I'm not mistaken.

tillrohrmann · 2019-05-20T14:41:31Z

...ntime/src/main/java/org/apache/flink/runtime/shuffle/UnknownShuffleDeploymentDescriptor.java

+/**
+ * Unknown {@link ShuffleDeploymentDescriptor}.
+ */
+public final class UnknownShuffleDeploymentDescriptor implements ShuffleDeploymentDescriptor {


Missing serial version UID

tillrohrmann · 2019-05-20T14:48:44Z

flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/ProducerShuffleDescriptor.java

+	@Nonnull
+	private final ResourceID producerResourceId;
+
+	/** The address to use to request the remote partition. */


For external shuffle service implementations, this does not necessarily be true.

zentol · 2019-05-28T10:59:26Z

flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/ShuffleMaster.java

+ * Intermediate result partition registry to use in {@link org.apache.flink.runtime.jobmaster.JobMaster}.
+ */
+public interface ShuffleMaster<T extends ShuffleDescriptor> {
+	CompletableFuture<T> registerPartitionWithProducer(


missing javadoc, in particular for the return value

I will add class parameter description to class level javadoc, not sure method comment will add too much value to it. The method name already says basically what is happening, unless you think that some specific detail should be mentioned.

I think interface methods should get JavaDocs especially if they are public API. Since we want to offer external shuffle service implementations this is the case.

zhijiangW · 2019-05-31T02:53:10Z

...runtime/src/main/java/org/apache/flink/runtime/deployment/InputGateDeploymentDescriptor.java

+			@Nonnegative int consumedSubpartitionIndex,
+			ShuffleDescriptor[] inputChannels,
+			ResourceID consumerLocation) {
+		this.consumedResultId = consumedResultId;


keep checkNotNull for these parameters?

zhijiangW · 2019-05-31T03:03:21Z

flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/ShuffleDescriptor.java

+ * Interface for shuffle deployment descriptor of result partition resource.
+ */
+public interface ShuffleDescriptor extends Serializable {
+	ResultPartitionID getResultPartitionID();


Add empty line before this method?

zhijiangW · 2019-05-31T03:09:49Z

...runtime/src/main/java/org/apache/flink/runtime/deployment/InputGateDeploymentDescriptor.java

+	 * <p>It can be used e.g. to compare with partition producer {@link ResourceID} in
+	 * {@link ProducerDescriptor} to determine producer/consumer co-location.
+	 */
+	private final ResourceID consumerLocation;


consumerLocation -> consumerResourceId? Because in ProducerDescriptor or NettyShuffleDescriptor, we also name producerResourceId

actually I think it is better to rename it in ProducerDescriptor & NettyShuffleDescriptor

zhijiangW · 2019-05-31T03:23:36Z

flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/ShuffleMaster.java

+ * Intermediate result partition registry to use in {@link org.apache.flink.runtime.jobmaster.JobMaster}.
+ */
+public interface ShuffleMaster<T extends ShuffleDescriptor> {
+	CompletableFuture<T> registerPartitionWithProducer(


empty line before method？

zhijiangW · 2019-05-31T03:32:53Z

flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/ProducerDescriptor.java

+	 */
+	private final int dataPort;
+
+	public ProducerDescriptor(


@VisibleForTesting

zhijiangW · 2019-06-03T08:02:34Z

...rc/test/java/org/apache/flink/runtime/io/network/partition/consumer/SingleInputGateTest.java

+			ShuffleDescriptor remoteSdd = createRemoteWithIdAndLocation(
+				remoteResultPartitionId.getPartitionId(),
+				ResourceID.generate());
+			inputGate.updateInputChannel(localLocation, new PartitionInfo(new IntermediateDataSetID(), remoteSdd));


ditto: reuse createPartitionInfo()

zhijiangW · 2019-06-03T08:03:42Z

...rc/test/java/org/apache/flink/runtime/io/network/partition/consumer/SingleInputGateTest.java

+			ShuffleDescriptor sdd = createRemoteWithIdAndLocation(
+				resultPartitionId.getPartitionId(),
+				ResourceID.generate());
+			inputGate.updateInputChannel(ResourceID.generate(), new PartitionInfo(new IntermediateDataSetID(), sdd));


could reuse createPartitionInfo(resultPartitionId, ResourceID.generate()) to create PartitionInfo here.

zhijiangW · 2019-06-03T08:27:31Z

flink-runtime/src/test/java/org/apache/flink/runtime/util/NettyShuffleDescriptorBuilder.java

+	}
+
+	public static NettyShuffleDescriptor createRemoteWithIdAndLocation(
+		IntermediateResultPartitionID partitionId,


add one more indentation for parameters

zhijiangW · 2019-06-03T08:38:48Z

flink-runtime/src/test/java/org/apache/flink/runtime/taskmanager/TaskTest.java

-		Collection<InputGateDeploymentDescriptor> inputGates) throws Exception {
+			Collection<ResultPartitionDeploymentDescriptor> resultPartitions,
+			Collection<InputGateDeploymentDescriptor> inputGates) throws Exception {
+		String errorMessage = "Network buffer pool has already been destroyed.";


not necessary changes for this method.

zhijiangW · 2019-06-03T09:23:18Z

...-runtime/src/test/java/org/apache/flink/runtime/taskexecutor/TaskExecutorSubmissionTest.java

+
+	private TaskDeploymentDescriptor createReceiver(
+			NettyShuffleDescriptor shuffleDescriptor,
+			ResourceID location) throws IOException {


I guess location is not needed in the parameter which could be created directly inside.

some tests rely on it to be the same as in some other components outside of this method.

zhijiangW · 2019-06-03T09:24:36Z

flink-runtime/src/test/java/org/apache/flink/runtime/executiongraph/ExecutionTest.java

 			slotProvider,
 			new NoRestartStrategy(),
 			jobVertex);
+		executionGraph.start(TestingComponentMainThreadExecutorServiceAdapter.forMainThread());


why need this change?

zhijiangW · 2019-06-03T09:24:46Z

...runtime/src/test/java/org/apache/flink/runtime/executiongraph/ExecutionVertexCancelTest.java

 	public void testSendCancelAndReceiveFail() throws Exception {
 		final ExecutionGraph graph = ExecutionGraphTestUtils.createSimpleTestGraph();

+		graph.start(TestingComponentMainThreadExecutorServiceAdapter.forMainThread());


zhijiangW

Thanks for the updates @azagrebin !

I have finished the whole reviews. Sorry for intermediate suspend reviews these days.

azagrebin · 2019-06-03T12:13:07Z

Thanks for the review @zhijiangW ! I have addressed the comments.

tillrohrmann

Thanks for addressing my comments @azagrebin. I had a couple of additional comments. Once these are resolved I think we are good to merge this PR :-)

tillrohrmann · 2019-06-04T12:44:05Z

...runtime/src/main/java/org/apache/flink/runtime/deployment/InputGateDeploymentDescriptor.java

+	private final ShuffleDescriptor[] inputChannels;
+
+	/**
+	 * {@link ResourceID} of partition consume to identify its location.


typo: consume -> consumer

tillrohrmann · 2019-06-04T12:49:38Z

...runtime/src/main/java/org/apache/flink/runtime/deployment/InputGateDeploymentDescriptor.java

+	 * <p>It can be used e.g. to compare with partition producer {@link ResourceID} in
+	 * {@link ProducerDescriptor} to determine producer/consumer co-location.
+	 */
+	private final ResourceID consumerLocation;


Why do we need this field? I thought that the InputGateDeploymentDescriptor is being used to create an InputGate on the TaskExecutor. If this is the case, then the information about the TaskExecutor's ResourceID should already be there. No need to transmit this additional information.

tillrohrmann · 2019-06-04T12:50:10Z

...runtime/src/main/java/org/apache/flink/runtime/deployment/InputGateDeploymentDescriptor.java

 	}

-	public InputChannelDeploymentDescriptor[] getInputChannelDeploymentDescriptors() {
+	public ShuffleDescriptor[] getInputChannelDescriptors() {


Better to call getShuffleDescriptors

tillrohrmann · 2019-06-04T12:51:03Z

...runtime/src/main/java/org/apache/flink/runtime/deployment/InputGateDeploymentDescriptor.java

 		return inputChannels;
 	}

+	public ResourceID getConsumerLocation() {


It looks as if this method + the field were introduced to make the creation of the InputGate a bit more convenient. I'm wondering whether this is not the wrong place to get this information from. I think it should come from the TaskExecutor.

tillrohrmann · 2019-06-04T12:58:51Z

...ntime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptorFactory.java

+			new TaskDeploymentDescriptor.Offloaded<>(taskInfo.right());
+	}
+
+	private List<InputGateDeploymentDescriptor> createInputGateDeploymentDescriptors(ResourceID location) {


Here we are mixing static methods with methods. Feels a bit weird given that we us this function in createDeploymentDescriptor

tillrohrmann · 2019-06-04T14:11:25Z

flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/ShuffleMaster.java

+ * Intermediate result partition registry to use in {@link org.apache.flink.runtime.jobmaster.JobMaster}.
+ */
+public interface ShuffleMaster<T extends ShuffleDescriptor> {
+	CompletableFuture<T> registerPartitionWithProducer(


I think interface methods should get JavaDocs especially if they are public API. Since we want to offer external shuffle service implementations this is the case.

tillrohrmann · 2019-06-04T14:15:26Z

...c/test/java/org/apache/flink/runtime/deployment/ResultPartitionDeploymentDescriptorTest.java

 /**
 * Tests for the {@link ResultPartitionDeploymentDescriptor}.
 */
 public class ResultPartitionDeploymentDescriptorTest {


Missing extends TestLogger

tillrohrmann · 2019-06-04T14:16:16Z

...c/test/java/org/apache/flink/runtime/deployment/ResultPartitionDeploymentDescriptorTest.java

+			createCopyAndVerifyResultPartitionDeploymentDescriptor(shuffleDescriptor);
+
+		assertThat(copy.getShuffleDescriptor(), instanceOf(UnknownShuffleDescriptor.class));
+		UnknownShuffleDescriptor copySdd = (UnknownShuffleDescriptor) copy.getShuffleDescriptor();


why is this cast needed?

leftover, will remove

tillrohrmann · 2019-06-04T14:25:54Z

...-runtime/src/test/java/org/apache/flink/runtime/taskexecutor/TaskExecutorSubmissionTest.java

+	}
+
+	private TaskDeploymentDescriptor createSender(
+			NettyShuffleDescriptor shuffleDeploymentDescriptor,


Naming of this parameter is not consistent with createReceiver.

tillrohrmann · 2019-06-04T14:26:49Z

flink-runtime/src/test/java/org/apache/flink/runtime/util/NettyShuffleDescriptorBuilder.java

+/**
+ * Builder to mock {@link NettyShuffleDescriptor} in tests.
+ */
+public class NettyShuffleDescriptorBuilder {


tillrohrmann

Thanks for addressing my comments @azagrebin. LGTM. Merging this PR once Travis gives green light.

azagrebin · 2019-06-05T11:54:36Z

Thanks for the review @tillrohrmann @zhijiangW !

Introduce PartitionLocation in NettyShuffleDescriptor and NettyShuffleDescriptorBuilder for tests Add ShuffleDescriptor.getResultPartitionID and isUnknown Use NettyShuffleDescriptorBuilder in StreamNetworkBenchmarkEnvironment Introduce ShuffleUtils.applyWithShuffleTypeCheck to isolate inout channel shuffle descriptor 'known' check and cast This closes apache#8362.

tillrohrmann · 2019-06-06T09:39:56Z

Failing test case seems to be unrelated. Merging this PR now.

rmetzger added the review=description? label May 7, 2019

rmetzger added the component=Runtime/Network label May 7, 2019

tillrohrmann self-assigned this May 7, 2019

azagrebin force-pushed the FLINK-11391-az branch 3 times, most recently from d9129e5 to 1c84635 Compare May 9, 2019 07:41

zhijiangW reviewed May 10, 2019

View reviewed changes

rmetzger requested a review from tillrohrmann May 11, 2019 11:43

tillrohrmann requested changes May 20, 2019

View reviewed changes

azagrebin force-pushed the FLINK-11391-az branch from 6928d02 to 2a052fa Compare May 22, 2019 13:56

rmetzger requested a review from tillrohrmann May 22, 2019 19:51

azagrebin force-pushed the FLINK-11391-az branch 2 times, most recently from 8a6079d to cdf56b1 Compare May 27, 2019 09:46

zentol reviewed May 28, 2019

View reviewed changes

zhijiangW reviewed May 31, 2019

View reviewed changes

zhijiangW reviewed Jun 3, 2019

View reviewed changes

azagrebin force-pushed the FLINK-11391-az branch 5 times, most recently from 34a400a to 0b09ef0 Compare June 4, 2019 12:10

tillrohrmann requested changes Jun 4, 2019

View reviewed changes

azagrebin force-pushed the FLINK-11391-az branch 2 times, most recently from 6a40912 to 4820219 Compare June 4, 2019 18:10

rmetzger requested a review from tillrohrmann June 5, 2019 08:47

azagrebin force-pushed the FLINK-11391-az branch from 7d0c2dc to c50c30d Compare June 5, 2019 10:13

tillrohrmann approved these changes Jun 5, 2019

View reviewed changes

rmetzger requested a review from tillrohrmann June 5, 2019 11:55

tillrohrmann force-pushed the FLINK-11391-az branch from c50c30d to 669df9e Compare June 5, 2019 11:58

tillrohrmann force-pushed the FLINK-11391-az branch from 669df9e to f7014b7 Compare June 5, 2019 16:40

azagrebin mentioned this pull request Jun 5, 2019

[FLINK-11392][network] Introduce ShuffleEnvironment interface #8608

Closed

tillrohrmann closed this in de67f2e Jun 6, 2019

azagrebin mentioned this pull request Aug 23, 2019

[FLINK-11391][shuffle] Introduce PartitionShuffleDescriptor and ShuffleDeploymentDescriptor #7631

Closed

[FLINK-11391] Introduce shuffle master interface #8362

[FLINK-11391] Introduce shuffle master interface #8362

Uh oh!

Conversation

azagrebin commented May 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented May 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Checks

Review Progress

Uh oh!

azagrebin commented May 7, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhijiangW left a comment

Choose a reason for hiding this comment

Uh oh!

azagrebin commented May 14, 2019

Uh oh!

zhijiangW commented May 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tillrohrmann left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tillrohrmann left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

azagrebin commented May 7, 2019 •

edited

Loading

flinkbot commented May 7, 2019 •

edited

Loading

zhijiangW commented May 15, 2019 •

edited

Loading