Skip to content

Conversation

@venkata91
Copy link
Contributor

Summary

Add support for Flink's Speculative Execution in batch execution mode

Details

Backport from 1.19 to 1.18 required some manual conflict resolution only in the tests. They are as follows:

  1. Instead of getRuntimeContext().getTaskInfo().getIndexOfThisSubtask() and getRuntimeContext().getTaskInfo().getAttemptNumber() use getRuntimeContext().getIndexOfThisSubtask() and getRuntimeContext().getAttemptNumber()
  2. Add dropDatabase() helper method in TestBase which required adding DEFAULT_CATALOG_NAME as a constant in FlinkCatalogFactory.

Backport from 1.18 to 1.17 went through cleanly.

Testing

Added an integration test to verify the expected speculative execution behavior with IcebergSource

@github-actions github-actions bot added the flink label Jul 24, 2024
@venkata91
Copy link
Contributor Author

cc @pvary for review.

@pvary
Copy link
Contributor

pvary commented Jul 24, 2024

The 1st point is fine.
@rodmeneses: How did we handle the drop database stuff in previous Flink releases?

Thanks, Peter

@rodmeneses
Copy link
Contributor

The 1st point is fine. @rodmeneses: How did we handle the drop database stuff in previous Flink releases?

Thanks, Peter

We didn't have dropDatabase on 1.17 and 1.18. Only dropCatalog existed there.

@pvary
Copy link
Contributor

pvary commented Jul 26, 2024

@rodmeneses: IIRC we faced an issue in the tests when we were introducing Flink 1.19. This was solved by adding the dropDatabase to the tests. We found that it would be good to have a better cleanup method for every tests which uses FlinkSQL, but decided against adding that along with the 1.19 PR. This basically resulted in "inconsistencies" between the 1.17-1.18/1.19 tests.

If my recollection above is correct, then we have 2 tasks ahead of us:

  1. This PR doesn't need to add the dropDatabase related changes to the backports
  2. We still need to come up with a better cleanup for the SQL tests

@rodmeneses: Could you please confirm?

Thanks,
Peter

@rodmeneses
Copy link
Contributor

@rodmeneses: IIRC we faced an issue in the tests when we were introducing Flink 1.19. This was solved by adding the dropDatabase to the tests. We found that it would be good to have a better cleanup method for every tests which uses FlinkSQL, but decided against adding that along with the 1.19 PR. This basically resulted in "inconsistencies" between the 1.17-1.18/1.19 tests.

If my recollection above is correct, then we have 2 tasks ahead of us:

  1. This PR doesn't need to add the dropDatabase related changes to the backports
  2. We still need to come up with a better cleanup for the SQL tests

@rodmeneses: Could you please confirm?

Thanks, Peter

that is correct @pvary. We discussed about having a more thorough cleaning logic for all of FlinkSQL related unit tests, but I didn't have the time nor I wanted to keep adding more complexity on my previous PR.

@venkata91
Copy link
Contributor Author

venkata91 commented Jul 26, 2024

This basically resulted in "inconsistencies" between the 1.17-1.18/1.19 tests.

@pvary Can you clarify what do you mean by the "inconsistencies" here?

@pvary
Copy link
Contributor

pvary commented Jul 29, 2024

This basically resulted in "inconsistencies" between the 1.17-1.18/1.19 tests.

@pvary Can you clarify what do you mean by the "inconsistencies" here?

Differences would have been a better word.

Based on the discussion above: @venkata91: Could you please remove the dropDatabase from the tests for now? Would tests work without it?

Thanks,
Peter

@venkata91 venkata91 force-pushed the vsowrira/spec-exec-1.18 branch from cf6e44d to 3c4c34a Compare July 29, 2024 20:27
@venkata91
Copy link
Contributor Author

This basically resulted in "inconsistencies" between the 1.17-1.18/1.19 tests.

@pvary Can you clarify what do you mean by the "inconsistencies" here?

Differences would have been a better word.

Based on the discussion above: @venkata91: Could you please remove the dropDatabase from the tests for now? Would tests work without it?

Thanks, Peter

Okay removed the dropDatabase for 1.17 and 1.18 from TestBase and currently dropping the database as part of after().

@venkata91
Copy link
Contributor Author

@pvary Gentle ping

@pvary
Copy link
Contributor

pvary commented Jul 31, 2024

@venkata91: Could you please check the failure?
Otherwise LGTM

@venkata91
Copy link
Contributor Author

venkata91 commented Jul 31, 2024

@venkata91: Could you please check the failure? Otherwise LGTM

@pvary I looked at it earlier and it seems to be unrelated. Test failure is in org.apache.iceberg.flink.TestFlinkAnonymousTable and I ran it locally and it passed as well. Could it be a transient test failure?

@venkata91
Copy link
Contributor Author

@pvary Btw, I merged the main branch changes to my local branch in order to trigger the tests again. Ideally, if it is transient issue, hopefully this should solve it.

@venkata91
Copy link
Contributor Author

@pvary Btw, I merged the main branch changes to my local branch in order to trigger the tests again. Ideally, if it is transient issue, hopefully this should solve it.

@pvary looks like all the checks passed now.

@pvary pvary merged commit 84c9125 into apache:main Aug 1, 2024
@pvary
Copy link
Contributor

pvary commented Aug 1, 2024

Merged to main.
Thanks @venkata91 for the backport!

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025
czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants