Skip to content

Conversation

@qazxcdswe123
Copy link
Contributor

@qazxcdswe123 qazxcdswe123 commented Mar 25, 2025

Which issue does this PR close?

Rationale for this change

The original PR is splited into 2 parts, one for float64 type and one for float64+decimal type

What changes are included in this PR?

  1. mv DistinctSumAccumulator to common so that it can be used in Float64DistinctAvgAccumulator
  2. implement Float64DistinctAvgAccumulator using DistinctSumAccumulator
  3. tested in aggregate.slt

Are these changes tested?

  • tested in aggregate.slt

Are there any user-facing changes?

No

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Mar 25, 2025
@qazxcdswe123 qazxcdswe123 requested a review from jayzhan211 March 25, 2025 14:15
@Omega359
Copy link
Contributor

the sqlite tests need updating prior to this issue being pushed to main. Confirmed test failures, here are a few examples:

External error: query is expected to fail with error:
        (regex) DataFusion error: Execution error: avg\(DISTINCT\) aggregations are not available
but got error:
        DataFusion error: Arrow error: Invalid argument error: number of columns(2) must match number of fields(3) in schema
[SQL] SELECT - COUNT ( * ) * AVG ( DISTINCT + col3 ) AS col3 FROM tab0 WHERE NOT NULL <= + + col3
at ../../datafusion-testing/data/sqlite/index/random/100/slt_good_1.slt:11104

External error: query is expected to fail, but actually succeed:
[SQL] SELECT DISTINCT - COUNT ( * ) * AVG ( DISTINCT + 89 ) col4 FROM tab0 AS cor0 WHERE NULL NOT IN ( CAST ( NULL AS INTEGER ) + - 15 )
at ../../datafusion-testing/data/sqlite/index/random/1000/slt_good_8.slt:17760

External error: query is expected to fail, but actually succeed:
[SQL] SELECT - AVG ( DISTINCT 27 ) + - MAX ( DISTINCT col1 ) AS col4 FROM tab0 WHERE NOT NULL <> - + 90
at ../../datafusion-testing/data/sqlite/index/random/1000/slt_good_0.slt:6040

External error: query is expected to fail with error:
        (regex) DataFusion error: Execution error: avg\(DISTINCT\) aggregations are not available
but got error:
        DataFusion error: Arrow error: Invalid argument error: number of columns(1) must match number of fields(2) in schema
[SQL] SELECT ALL - AVG ( DISTINCT 48 ) AS col4 FROM tab0 WHERE NOT - col4 BETWEEN ( NULL ) AND CAST ( 2 AS INTEGER )
at ../../datafusion-testing/data/sqlite/index/random/1000/slt_good_7.slt:38250

@alamb
Copy link
Contributor

alamb commented Mar 28, 2025

I merged up from main and will use the new github action to run extended tests

@alamb
Copy link
Contributor

alamb commented Mar 28, 2025

Run extended tests

@Omega359
Copy link
Contributor

Run extended tests

I see it did trigger but I somehow was expecting feedback in the comments

@Omega359
Copy link
Contributor

jayzhan211
jayzhan211 previously approved these changes Mar 29, 2025
@jayzhan211 jayzhan211 dismissed their stale review March 29, 2025 01:41

extended test

@Omega359
Copy link
Contributor

Omega359 commented Apr 4, 2025

I regenerated the sqlite tests again and I think an issue I'm seeing with them is actually caused or triggered by this PR.

Here is an example:

query error DataFusion error: Arrow error: Invalid argument error: number of columns\(1\) must match number of fields\(2\) in schema
SELECT 84 * + - 61 * + AVG ( DISTINCT ( 70 ) ) AS col2 FROM tab4 WHERE ( NULL ) BETWEEN NULL AND col3

If this is in fact a bug not caused by this PR a new issue should be filed. I myself do not have the time to diagnosis the cause of this unfortunately.

@qazxcdswe123
Copy link
Contributor Author

I regenerated the sqlite tests again and I think an issue I'm seeing with them is actually caused or triggered by this PR.

Here is an example:

query error DataFusion error: Arrow error: Invalid argument error: number of columns\(1\) must match number of fields\(2\) in schema
SELECT 84 * + - 61 * + AVG ( DISTINCT ( 70 ) ) AS col2 FROM tab4 WHERE ( NULL ) BETWEEN NULL AND col3

If this is in fact a bug not caused by this PR a new issue should be filed. I myself do not have the time to diagnosis the cause of this unfortunately.

sadly I'm working on my undergrad thesis project at this time and do not have time to investigate this either 😢 , might be back around mid april

@alamb
Copy link
Contributor

alamb commented Apr 6, 2025

sadly I'm working on my undergrad thesis project at this time and do not have time to investigate this either 😢 , might be back around mid april

Good luck with your project / thesis!

@qazxcdswe123 qazxcdswe123 marked this pull request as draft April 24, 2025 13:42
@github-actions github-actions bot added documentation Improvements or additions to documentation sql SQL Planner development-process Related to development process of DataFusion logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate substrait Changes to the substrait crate labels Apr 25, 2025
@github-actions github-actions bot added catalog Related to the catalog crate common Related to common crate execution Related to the execution crate proto Related to proto crate datasource Changes to the datasource crate ffi Changes to the ffi crate labels Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation execution Related to the execution crate ffi Changes to the ffi crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates proto Related to proto crate sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants