Skip to content

Conversation

@judahrand
Copy link
Contributor

@judahrand judahrand commented Nov 18, 2022

#19730

This is adding a test to demonstrate that the issue still exists and needs fixing. I'm yet to identify a fix.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @robertwb for label java.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@judahrand
Copy link
Contributor Author

R: @damccorm

I'm struggling to get the tests to run locally on an M1 MacBook so was hoping to be able to use the CI to demonstrate the issue. But it looks like the extensions unit tests don't run by default.

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@damccorm
Copy link
Contributor

From the conversation in https://issues.apache.org/jira/browse/BEAM-7610 and related PRs, this should have been fixed by #14146

@apilloud could you take a look at this one since you have more context here?

@judahrand
Copy link
Contributor Author

judahrand commented Nov 18, 2022

From the conversation in https://issues.apache.org/jira/browse/BEAM-7610 and related PRs, this should have been fixed by #14146

@apilloud could you take a look at this one since you have more context here?

It's an issue we're definitely having in Dataflow SQL. A query which looks roughly like this:

SELECT
    window_start,
    window_end,
    COALESCE(ARRAY_AGG(foo), []) AS col1
FROM HOP(
    (SELECT * FROM pubsub.topic.`project_id`.`topic`),
    DESCRIPTOR(event_timestamp),
    "INTERVAL 5 MINUTE",
    "INTERVAL 15 MINUTE"
) as data
GROUP BY window_start, window_end

results in an error downstream which says that Data Catalog does not support NULLABLE ARRAYS. This means that the output schema of the query must not be correctly handling the COALESCE.

@judahrand
Copy link
Contributor Author

In fact, does the basic test not prove that this is still broken?

@Test
public void testCoalesceBasic() {
String sql = "SELECT COALESCE(@p0, @p1, @p2) AS ColA";
ImmutableMap<String, Value> params =
ImmutableMap.of(
"p0",
Value.createSimpleNullValue(TypeKind.TYPE_STRING),
"p1",
Value.createStringValue("yay"),
"p2",
Value.createStringValue("nay"));
PCollection<Row> stream = execute(sql, params);
final Schema schema = Schema.builder().addNullableField("field1", FieldType.STRING).build();
PAssert.that(stream).containsInAnyOrder(Row.withSchema(schema).addValues("yay").build());
pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
}

Given that the final value of the COALESCE in this test is a STRING the return type should be STRING NOT NULL but the test asserts that it is NULLABLE.

@judahrand
Copy link
Contributor Author

This is another important detail for our particular query: according to the ZetaSQL spec[1] a NULL array and an empty array are distinct and so our query should work.

[1] https://github.com/google/zetasql/blob/master/docs/data-types.md#nulls-and-the-array-type

@damccorm
Copy link
Contributor

Sorry, I wasn't clear. I'm not making a claim about whether or not this is actively an issue; I was just saying that given the conversation around the calcite upgrade I'm surprised that it is still a problem. I would have expected that upgrade to fix the problem given my loose understanding of the issue

@codecov
Copy link

codecov bot commented Nov 18, 2022

Codecov Report

Merging #24249 (cf56af2) into master (cf56af2) will not change coverage.
The diff coverage is n/a.

❗ Current head cf56af2 differs from pull request most recent head 7550cae. Consider uploading reports for the commit 7550cae to get more accurate results

@@           Coverage Diff           @@
##           master   #24249   +/-   ##
=======================================
  Coverage   73.46%   73.46%           
=======================================
  Files         714      714           
  Lines       96497    96497           
=======================================
  Hits        70889    70889           
  Misses      24286    24286           
  Partials     1322     1322           
Flag Coverage Δ
go 51.45% <0.00%> (ø)
python 83.17% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@damccorm
Copy link
Contributor

damccorm commented Dec 2, 2022

Looks like Andrew is on vacation, @kileys could you weigh in here?

@kileys kileys requested a review from apilloud December 13, 2022 18:23
@apilloud
Copy link
Member

Your new test is running in the SQL Precommit and passing: https://ci-beam.apache.org/job/beam_PreCommit_SQL_Commit/5807/testReport/org.apache.beam.sdk.extensions.sql.zetasql/ZetaSqlDialectSpecTest/testCoalesceNotNull/

Beam ZetaSQL has significant issues with null handling, so an issue like this existing wouldn't surprise me. I think you need a schema that supports nullable fields though? Try this:
final Schema schema = Schema.builder().addNullableField("field1", FieldType.STRING).build();

The latest release of Dataflow SQL is based on df6efe3 (somewhere between v2.38.0 and v2.39.0), which is several months old so it is possible that it has been fixed since. I know #17387 is more recent, which fixes some null bugs. Possibly try rebasing on that commit?

@github-actions
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Feb 12, 2023
@judahrand
Copy link
Contributor Author

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

I think this is still an issue.

@github-actions github-actions bot removed the stale label Feb 14, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2023

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 7, 2023
@github-actions
Copy link
Contributor

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants