Skip to content

Conversation

@arkadius
Copy link
Contributor

@arkadius arkadius commented Oct 2, 2024

Summary

Currently, the unit tests for IcebergSink (based on the Flink SinkV2 interface) have a code duplication from FlinkSink tests. In the IcebergSink tests there were done some code enhancements that haven't been migrated to FlinkSink tests.

The scope of this change is to migrate these enhancements.
List of enhancements:

  • Flink.checkAndGetEqualityFieldIds has been replaced by SinkUtil.checkAndGetEqualityFieldIds
  • as("description") has been added to some assertions

Other improvements that were done:

  • Typo fix in testOperatorsUidNameWitUidSuffix
  • TestFlinkIcebergSinkV2Branch and TestFlinkIcebergSinkV2Base extended TestFlinkIcebergSinkV2Base which has test parameters defined but they weren't used. Instead of that, were used the default, initial values of fields. After this change, test parameters are only defined in leaf classes
  • TestFlinkIcebergSinkV2Base.testChangeLogs has a keySelector parameter that wasn't used anywhere - this change removes it. To discuss: Are tests based on this selector correctly checking what was intended?

@github-actions github-actions bot added the flink label Oct 2, 2024
@arkadius
Copy link
Contributor Author

arkadius commented Oct 2, 2024

@rodmeneses @pvary please take a look

@arkadius arkadius changed the title FlinkSink & IcebergSink desynchronized tests alignment Flink: FlinkSink & IcebergSink desynchronized tests alignment Oct 2, 2024
@rodmeneses
Copy link
Contributor

@pvary @stevenzwu
could you guys please start the CI pipelines on this PR?
Thanks

@rodmeneses
Copy link
Contributor

This one looks good to me.
@pvary could you please add me as reviewer so I can register my approval ?
Thanks

@arkadius
Copy link
Contributor Author

This one looks good to me. @pvary could you please add me as reviewer so I can register my approval ? Thanks

@pvary can you add @rodmeneses as a reviewer or look at this PR on your own? This is very non-controversial change. Only cleanup of unused code and synchronization of improvements in the code that are already done in other places.

Comment on lines +121 to +129
assertThat(partitionFiles("aaa"))
.as("There should be only 1 data file in partition 'aaa'")
.isEqualTo(1);
assertThat(partitionFiles("bbb"))
.as("There should be only 1 data file in partition 'bbb'")
.isEqualTo(1);
assertThat(partitionFiles("ccc"))
.as("There should be only 1 data file in partition 'ccc'")
.isEqualTo(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the comments in this file add any value above the ones we can read from the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't answer on this question because I'm not up-to-date with assertj. I copied it from this PR: https://github.com/apache/iceberg/pull/10179/files#diff-42c31c3c7e3e8287ea451196f6c0b4d0a47aa931391df70369a87ed3d19a1452R221 @rodmeneses copied this code for SinkV2 and (I assume) forgot to backport it to the original source of the code. We can synchronize it in the opposite direction (remove these explanations from duplicated places) if we find out that it doesn't add a value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.. I'm fine in both ways, and I agree with the sync effort.
@rodmeneses: What is the history of this difference? Is this intentional?

Comment on lines +156 to +165
// Use schema identifier field IDs as equality field id list by default
assertThat(SinkUtil.checkAndGetEqualityFieldIds(table, null))
.containsExactlyInAnyOrderElementsOf(table.schema().identifierFieldIds());

// Use user-provided equality field column as equality field id list
builder.equalityFieldColumns(Lists.newArrayList("id"));
assertThat(SinkUtil.checkAndGetEqualityFieldIds(table, Lists.newArrayList("id")))
.containsExactlyInAnyOrder(table.schema().findField("id").fieldId());

assertThat(SinkUtil.checkAndGetEqualityFieldIds(table, Lists.newArrayList("type")))
.containsExactlyInAnyOrder(table.schema().findField("type").fieldId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this feature have anything to do with the SinkV2? Shall we create a different test for SinkUtil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestIcebergSinkV2 is a copy of TestFlinkIcebergSinkV2 (which tests V1-based FlinkSink). For some reasons, these checks weren't copied. At first glance it looks like most of tests is this class don't check anything related with SinkV2. Take a look that most of test cases use TestFlinkIcebergSinkV2Base.testChangeLogs which uses FlinkSink (V1 based). My proposition of plan is:

  1. Synchronize the code duplication
  2. Replace duplicated classes with parametrized test case
  3. Extract things that are not related with either V1 or V2 sinks to some other classes as you suggested

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodmeneses: Isn't this a mistake in the tests? Shouldn't these tests use something like testChangeLogsV2 and use the new sink?

@github-actions
Copy link

github-actions bot commented Dec 5, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 5, 2024
@arkadius arkadius closed this Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants