Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Sep 10, 2018

What changes were proposed in this pull request?

It turns out it's a bug that a DataSourceV2ScanExec instance may be referred to in the execution plan multiple times. This bug is fixed by #22284 and now we have corrected SQL metrics for batch queries.

Thus we don't need the hack in ProgressReporter anymore, which fixes the same metrics problem for streaming queries.

How was this patch tested?

existing tests

@cloud-fan
Copy link
Contributor Author

cc @tdas @zsxwing @mgaido91

Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with this part of the code, but this dedup logic is indeed not needed anymore.

I am just wondering how could this work (pass UTs) for the DSv1, as I see no dedup logic there. Do you have any idea?

@cloud-fan
Copy link
Contributor Author

ok to test

@SparkQA
Copy link

SparkQA commented Sep 10, 2018

Test build #95878 has finished for PR 22380 at commit ee0df17.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@SparkQA
Copy link

SparkQA commented Sep 10, 2018

Test build #95875 has finished for PR 22380 at commit ee0df17.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

thanks, merging to master!

@asfgit asfgit closed this in 0e680dc Sep 11, 2018
fjh100456 pushed a commit to fjh100456/spark that referenced this pull request Sep 13, 2018
## What changes were proposed in this pull request?

It turns out it's a bug that a `DataSourceV2ScanExec` instance may be referred to in the execution plan multiple times. This bug is fixed by apache#22284 and now we have corrected SQL metrics for batch queries.

Thus we don't need the hack in `ProgressReporter` anymore, which fixes the same metrics problem for streaming queries.

## How was this patch tested?

existing tests

Closes apache#22380 from cloud-fan/followup.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants