Skip to content

Conversation

@kbendick
Copy link
Contributor

@kbendick kbendick commented Oct 1, 2020

For many of the tests that are using Parameterized from junit, our description winds up reading .

Here's an example

testOverwriteAnotherRangeWithinPartitionValidated[0]

However, we'd like to have a format string to show what the parameter is and what its current value is in a given test run. This can be accomplished by using name in the Parameterized.Parameters section where test parameters to be looped over are placed.

I also am taking the liberty to simplify some of the parameter definitions.

@kbendick
Copy link
Contributor Author

kbendick commented Oct 1, 2020

In some of the tests, one of the parameters is a String[] instead of a String. Because the current JUnit set up casts the names based on index by calling .toString() on them, we wind up with some bad names in these edge cases. Example:

testReplacePartitions[catalogName=testhadoop, baseNamespace=[Ljava.lang.String;36c1c7d9], format=PARQUET, isStreaming=false]

We can fix this in a few ways. One, there is another library that offers us the ability to have more control over parameters: https://github.com/Pragmatists/JUnitParams

However, I'm not so sure about that. Additionally, while baseNamespace is not exactly a namespace, it is in a way. And there is a class Namespace with a meaningful toString method. We can either create an additional BaseNamespace class or simply delegate to Namespace and accept that a BaseNamespace and a Namespace are essentially just references to the different possible lengths of a namespace.

What do others think @rdblue? baseNamespace as shown above is the only example I have found. And Namespace already has a helper function in the Flink test code:

  private Namespace toNamespace(String database) {
    String[] namespace = new String[baseNamespace.length + 1];
    System.arraycopy(baseNamespace, 0, namespace, 0, baseNamespace.length);
    namespace[baseNamespace.length] = database;
    return Namespace.of(namespace);
  }

I can also create a separate issue for the ones that generate wonky names and we can deal with them / debate the meaning of Namespace in another PR since this one is already huge (and intended to be a simple swap one for one).

@kbendick kbendick marked this pull request as ready for review October 1, 2020 02:50
@rdblue
Copy link
Contributor

rdblue commented Oct 1, 2020

I'd say let's reuse Namespace. Thanks!

@rdblue
Copy link
Contributor

rdblue commented Oct 1, 2020

Mostly looks good! Just a couple questions.

@kbendick
Copy link
Contributor Author

kbendick commented Oct 1, 2020

I was wondering why the tags didn't get updated for this, but I think it's because I initially opened this as a draft. I've noticed on push in other people's PRs that they eventually populate (like when we added flink_runtime to the labeler config and later on the flink tag appeared in the PR).

I'll look into this issue and if the tests are somehow not triggering labels appropriately, I'll make a change. But I don't think that's the case here as I have also edited some flink files that are not underneath any test directory.

@kbendick
Copy link
Contributor Author

kbendick commented Oct 1, 2020

I'd say let's reuse Namespace. Thanks!

I agree. I've created an issue to follow up on that as it will ever so slightly require us to change the code in src and given the size of this PR, I wanted to keep the changes limited to things underneath test. I will follow up. 👍

{ "orc", "time", "10:02:34.000000", "10:02:34.000001" },
// Temporarily disable filters on Timestamp columns due to ORC-611
// new Object[] { "orc", "timestamp",
// { "orc", "timestamp",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked on this, and ORC-611 is closed and the fix versions are 1.6.4 and 1.7.0.

I'm going to see if I can get these tests to pass or open a ticket about possibly upgrading our ORC version if not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried updating but it did not affect the tests. There were several orc related files on the compile time path, but I'm not sure what ran during the tests.

It looks like the issue ORC-611 was closed, but it still exists in https://issues.apache.org/jira/browse/HIVE-23036

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to also try 1.7.0, but I'm wondering if we aren't fully excluding some ORC dependencies from other dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out 1.7.0 is not published to maven. I am keeping these tests commented out with a mention of the corresponding still open HIVE ticket, as it indicates where in the OrcInputFormat for Hive is the issue. Might be of use to us.

@rdblue rdblue merged commit 425b10c into apache:master Oct 2, 2020
@rdblue
Copy link
Contributor

rdblue commented Oct 2, 2020

Thanks, @kbendick! This is really helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants