-
Notifications
You must be signed in to change notification settings - Fork 13.9k
[FLINK-10945] Use InputDependencyConstraint to avoid resource dead… #7255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…locks for finite stream jobs when resources are limited
azagrebin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution @zhuzhurk !
I have left some comments.
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java
Outdated
Show resolved
Hide resolved
...ntime/src/main/java/org/apache/flink/runtime/executiongraph/IntermediateResultPartition.java
Outdated
Show resolved
Hide resolved
...ntime/src/main/java/org/apache/flink/runtime/executiongraph/IntermediateResultPartition.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java
Outdated
Show resolved
Hide resolved
.../src/test/java/org/apache/flink/runtime/deployment/InputChannelDeploymentDescriptorTest.java
Show resolved
Hide resolved
azagrebin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zhuzhurk ! I also added couple of comments for tests.
| return null; | ||
| }, | ||
| executor); | ||
| if (consumerVertex.checkInputDependencyConstraints()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the TODO comment belongs to the first line of the new scheduleConsumer method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, the TODO comment is related to the "consumerState == CREATED" section in scheduleOrUpdateConsumers, which invokes cachePartitionInfo first and then schedules the vertex. The cachePartitionInfo action is needed to avoid deployment race, at the cost of redundant partition infos to update to task, which is the concern as described in the TODO comment.
So far the redundant partition it's not a big problem. But I think we can optimize it later. One possible solution in my mind is to remove known partition infos from the cache when creating InputChannelDeploymentDescriptor.
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java
Outdated
Show resolved
Hide resolved
...rc/test/java/org/apache/flink/runtime/executiongraph/ExecutionVertexInputConstraintTest.java
Outdated
Show resolved
Hide resolved
...rc/test/java/org/apache/flink/runtime/executiongraph/ExecutionVertexInputConstraintTest.java
Show resolved
Hide resolved
...rc/test/java/org/apache/flink/runtime/executiongraph/ExecutionVertexInputConstraintTest.java
Outdated
Show resolved
Hide resolved
...e/src/test/java/org/apache/flink/runtime/executiongraph/IntermediateResultPartitionTest.java
Show resolved
Hide resolved
…endencyConstraint == ANY 2. Fixes for tests
azagrebin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments @zhuzhurk
tillrohrmann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution @zhuzhurk. The changes look very good. I had some minor comments which we could address before merging.
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java
Outdated
Show resolved
Hide resolved
| public boolean checkInputDependencyConstraints() { | ||
| if (getExecutionGraph().getInputDependencyConstraint() == InputDependencyConstraint.ANY) { | ||
| // InputDependencyConstraint == ANY | ||
| return IntStream.range(0, inputEdges.length).anyMatch(this::isInputConsumable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having moved isInputConsumable into IntermediateResult allows to get rid of the indirection of the IntStream.
| * @return whether the input constraint is satisfied | ||
| */ | ||
| public boolean checkInputDependencyConstraints() { | ||
| if (getExecutionGraph().getInputDependencyConstraint() == InputDependencyConstraint.ANY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the InputDependencyConstraint rather be a value of the ExecutionJobVertex than the ExecutionGraph? I guess it should be configurable for each operator individually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've move InputDependencyConstraint to JobVertex. And the job wide default value can be configured in ExecutionConfig. But I haven't make it configurable through DataSet/DataStream API yet.
I agree we should support the constraint configurable for each operator. But I'm not quite sure whether we should support it with DataSet API or later for the stream/batch unified StreamGraph/Transformation API? Could you share your suggestion?
In our production experience, a job-wide configured input constraint satisfies most users, together with BATCH_FORCED execution mode, to ensure a batch job can finish with limited resources.
|
|
||
| if (partition.getIntermediateResult().getResultType().isPipelined()) { | ||
| // Schedule or update receivers of this partition | ||
| partition.markSomePipelinedDataProduced(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be generalized by having a partition#markDataProduced and then calling it fall intermediate results independent of the type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
|
|
||
| try { | ||
| executionGraph.setInputDependencyConstraint( | ||
| jobGraph.getSerializedExecutionConfig().deserializeValue(classLoader).getInputDependencyConstraint()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this was the easiest way how to get the InputDependencyConstraint into the ExecutionGraph but I think it should be part of the JobVertex, because it is not a global setting but rather controls how each vertex is scheduled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. I've move InputDependencyConstraint to JobVertex.
…gurable through API yet) 2. Refine input consumable checks
tillrohrmann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixup @zhuzhurk. My last comment is whether markDataProduced should not be always called in the scheduleOrUpdate method independent of the result type. Once we have resolved this comment, the PR is ready to be merged.
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java
Outdated
Show resolved
Hide resolved
|
Thanks for addressing my comments @zhuzhurk. Looks really good now. Merging this PR now. |
…locks for finite stream jobs when resources are limited This commit adds a job config InputDependencyConstraint, which helps to avoid resource deadlocks in LAZY_FROM_SOURCES scheduling when resources are limited. The InputDependencyConstraint controls across multiple inputs when consumers are scheduled. Currently it supports ANY and ALL. ANY means that any input intermediate result partition must be consumable and ALL means that all input intermediate result partitions (from all inputs) need to be consumable in order to schedule the consumer task. This closes apache#7255.
|
Thanks Andrey(@azagrebin) and Till(@tillrohrmann) for the reviewing. |
…locks for finite stream jobs when resources are limited This commit adds a job config InputDependencyConstraint, which helps to avoid resource deadlocks in LAZY_FROM_SOURCES scheduling when resources are limited. The InputDependencyConstraint controls across multiple inputs when consumers are scheduled. Currently it supports ANY and ALL. ANY means that any input intermediate result partition must be consumable and ALL means that all input intermediate result partitions (from all inputs) need to be consumable in order to schedule the consumer task. This closes apache#7255.
…locks in LAZY_FROM_SOURCES scheduling when resources are limited
What is the purpose of the change
This PR add a job config InputDependencyConstraint, which helps to avoid resource deadlocks in LAZY_FROM_SOURCES scheduling when resources are limited, as described in FLINK-10945.
Brief change log
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes)Documentation