Skip to content

Conversation

@venkata91
Copy link
Contributor

@venkata91 venkata91 commented Jun 21, 2024

Summary

Add support for Flink's Speculative Execution in batch execution mode

Testing

Added an integration test to verify the expected speculative execution behavior with IcebergSource

@github-actions github-actions bot added the flink label Jun 21, 2024
@venkata91 venkata91 changed the base branch from main to 1.1.x June 21, 2024 18:13
Copy link

@becketqin becketqin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@venkata91 Thanks for the patch. Left a comment. It looks like there is a simpler approach.

@venkata91 venkata91 force-pushed the vsowrira/flink-spec-exec branch from bcc1550 to 347e25a Compare June 29, 2024 04:47
@venkata91 venkata91 changed the base branch from 1.1.x to main June 29, 2024 04:47
@venkata91 venkata91 changed the title Support for Flink's SpeculativeExecution Support for Flink's SpeculativeExecution in batch execution mode Jun 29, 2024
@venkata91 venkata91 marked this pull request as ready for review June 29, 2024 04:48
@venkata91
Copy link
Contributor Author

cc @stevenzwu for review. Should this change also be made in other Flink versions like Flink-1.17 and Flink-1.18?

@pvary
Copy link
Contributor

pvary commented Jul 3, 2024

@venkata91: How can we be sure that the tests are exercising the speculative execution code path?

Does any of the tests reads some splits multiple times, and use the result of the faster one?

I think it would be useful to have a test demonstrating that the behavior works, to prevent disabling it by an unrelated change by accident.

@venkata91
Copy link
Contributor Author

@venkata91: How can we be sure that the tests are exercising the speculative execution code path?

Does any of the tests reads some splits multiple times, and use the result of the faster one?

I think it would be useful to have a test demonstrating that the behavior works, to prevent disabling it by an unrelated change by accident.

Sure sounds good. Will add a test.

@venkata91
Copy link
Contributor Author

@venkata91: How can we be sure that the tests are exercising the speculative execution code path?

Does any of the tests reads some splits multiple times, and use the result of the faster one?

I think it would be useful to have a test demonstrating that the behavior works, to prevent disabling it by an unrelated change by accident.

@pvary Added an integration test to verify the tasks are speculated and produces the expected output. PTAL. btw, should this change also be made in other Flink versions like Flink-1.17 and Flink-1.18?

@venkata91 venkata91 requested a review from becketqin July 18, 2024 22:27
@venkata91
Copy link
Contributor Author

Gentle ping for reviews cc @pvary @stevenzwu Thanks!

}

@Test
public void testSpeculativeExecution() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my local tests this passes even without the changes in AbstractIcebergEnumerator.
Does that mean, that the speculative execution is supported even without this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Looks like I didn't set table.exec.iceberg.use-flip27-source to true. With this, without the changes in AbstractIcebergEnumerator, it will fail with an exception saying:

The split enumerator StaticIcebergEnumerator must implement SupportsHandleExecutionAttemptSourceEvent to be used in concurrent execution attempts scenario (e.g. if speculative execution is enabled"

@venkata91
Copy link
Contributor Author

@pvary should this change be ported to other flink versions like 1.17 and 1.18?

@pvary
Copy link
Contributor

pvary commented Jul 23, 2024

@pvary should this change be ported to other flink versions like 1.17 and 1.18?

In a follow-up backport pr

@pvary pvary merged commit 91585e9 into apache:main Jul 24, 2024
@pvary
Copy link
Contributor

pvary commented Jul 24, 2024

Merged to main.
Thanks for the PR @venkata91!

Could you please create the backport PR to the other Flink versions? The PR could be generated like this:

git diff <HASH>^ <HASH> | sed "s/v1.19/v1.18/g" | git apply -3 -p1
git diff <HASH>^ <HASH> | sed "s/v1.19/v1.17/g" | git apply -3 -p1

@venkata91
Copy link
Contributor Author

Could you please create the backport PR to the other Flink versions?

@pvary Should we have 2 backport PRs one for 1.17 and 1.18 or is it fine to do it in a single PR?

venkata91 added a commit to venkata91/iceberg that referenced this pull request Jul 24, 2024
@pvary
Copy link
Contributor

pvary commented Jul 24, 2024

Could you please create the backport PR to the other Flink versions?

@pvary Should we have 2 backport PRs one for 1.17 and 1.18 or is it fine to do it in a single PR?

A single PR would be fine, as we don't expect any serious changes to review

pvary pushed a commit that referenced this pull request Aug 1, 2024
jasonf20 pushed a commit to jasonf20/iceberg that referenced this pull request Aug 4, 2024
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025
czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025
czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025
czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants