Skip to content

feat: support configuring distinct count agg function#143

Merged
aaron-steinfeld merged 4 commits intomainfrom
optimize-functions
Jun 28, 2022
Merged

feat: support configuring distinct count agg function#143
aaron-steinfeld merged 4 commits intomainfrom
optimize-functions

Conversation

@aaron-steinfeld
Copy link
Copy Markdown
Contributor

@aaron-steinfeld aaron-steinfeld commented Jun 23, 2022

Description

  • Adds the ability to configure the aggregation used for DISTINCTCOUNT, defaulting to the existing implementation which is the exact computation. Behavior of various versions of this function are described in https://docs.pinot.apache.org/users/user-guide-query/how-to-handle-unique-counting
  • Updates the avg rate function to use the / operator rather than the existing DIV(). This is better optimized by Pinot.
  • Refactored config of aggregation configuration for easier extendability

Testing

Ran and updated unit tests

Checklist:

  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • Any dependent changes have been merged and published in downstream modules

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 23, 2022

Codecov Report

Merging #143 (729ec2f) into main (1730c1e) will decrease coverage by 0.02%.
The diff coverage is 96.55%.

@@             Coverage Diff              @@
##               main     #143      +/-   ##
============================================
- Coverage     82.16%   82.13%   -0.03%     
- Complexity      627      633       +6     
============================================
  Files            65       66       +1     
  Lines          2377     2385       +8     
  Branches        242      244       +2     
============================================
+ Hits           1953     1959       +6     
- Misses          326      329       +3     
+ Partials         98       97       -1     
Flag Coverage Δ
integration 82.13% <96.55%> (-0.03%) ⬇️
unit 79.86% <96.55%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../query/service/pinot/PinotBasedRequestHandler.java 70.27% <ø> (-0.69%) ⬇️
...pinot/converters/PinotFunctionConverterConfig.java 93.33% <93.33%> (ø)
...rvice/pinot/converters/PinotFunctionConverter.java 94.87% <100.00%> (ø)
...rtrace/core/query/service/QueryServiceStarter.java 55.55% <0.00%> (-2.78%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1730c1e...729ec2f. Read the comment docs.

@aaron-steinfeld
Copy link
Copy Markdown
Contributor Author

aaron-steinfeld commented Jun 23, 2022

Verified this is broken on main, independent of this PR.This is why reproducible tests are good :( https://github.com/hypertrace/query-service/runs/7032109114?check_suite_focus=true

Guessing the integration tests are not set up to support the current version of pinot, will see if I can poke some things.

@aaron-steinfeld
Copy link
Copy Markdown
Contributor Author

Confirmed, introduced by https://github.com/hypertrace/pinot/releases/tag/0.5.0 . Pinning to the previous tag completes successfully.

Copy link
Copy Markdown
Contributor

@kotharironak kotharironak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm


assertEquals(
"SUM(DIV(foo, 2.0))",
"SUM(foo / 2.0)",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: Do we have any value level test that compares prior (SUM(DIV(foo, 2.0)) vs SUM(foo / 2.0, resulting in the same values?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be testing pinot's functionality - that DIV and / both actually... divide. Not sure it's worth writing a test, but we can verify by hand real quick.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select sum(num_calls), sum(num_calls / 60.0), sum(div(num_calls, 60.0)) ...

image

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, tried the same thing too.
org-query:
org-query

modified query:
modified-query

@kotharironak
Copy link
Copy Markdown
Contributor

Confirmed, introduced by https://github.com/hypertrace/pinot/releases/tag/0.5.0 . Pinning to the previous tag completes successfully.

So, are you pinning to the previous tag hypertrace/pinot-servicemanager:main to fix this test now?

@laxmanchekka
Copy link
Copy Markdown
Contributor

It can’t find that "" consumer group. This is a change done during upgrade (0.7.0 to 0.10.0)
Earlier group name was empty string and doesn't allow us to configure.
Now, it allows us to configure. We are giving relevant names for each table during view creation. It's not "" empty string any more.

private static final String DISTINCT_COUNT_AGGREGATION_FUNCTION_CONFIG =
"distinctCountAggFunction";
private static final String DEFAULT_PERCENTILE_AGGREGATION_FUNCTION = "PERCENTILETDIGEST";
private static final String DEFAULT_DISTINCT_COUNT_AGGREGATION_FUNCTION = "DISTINCTCOUNT";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make DISTINCT_COUNT_HLL as the default.

Copy link
Copy Markdown
Contributor Author

@aaron-steinfeld aaron-steinfeld Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it this way for backwards compatibility. Without any custom config it'll retain identical behavior as before, but it's easily changed.

…#146)

* fix: the integration test with latest pinot image for group.id change

* making it 5 mins for worst case

* updated comments

* reverted the test image
laxmanchekka
laxmanchekka previously approved these changes Jun 28, 2022
@github-actions

This comment has been minimized.

@aaron-steinfeld aaron-steinfeld merged commit 56613da into main Jun 28, 2022
@aaron-steinfeld aaron-steinfeld deleted the optimize-functions branch June 28, 2022 12:34
@github-actions
Copy link
Copy Markdown

Unit Test Results

  36 files  ±0    36 suites  ±0   8s ⏱️ -2s
202 tests +1  202 ✔️ +1  0 💤 ±0  0 ❌ ±0 

Results for commit 56613da. ± Comparison against base commit 1730c1e.

Harnoor-se7en pushed a commit to razorpay/hypertrace-query-service that referenced this pull request Jul 14, 2022
* feat: support configuring distinct count agg function

* fix: restore missing sum

* fix: the integration test with latest pinot image for group.id change (hypertrace#146)

* fix: the integration test with latest pinot image for group.id change

* making it 5 mins for worst case

* updated comments

* reverted the test image

* refactor: update a test config, remove unused const

Co-authored-by: kotharironak <53209990+kotharironak@users.noreply.github.com>
Harnoor-se7en added a commit to razorpay/hypertrace-query-service that referenced this pull request Dec 8, 2022
* feat: support configuring distinct count agg function (hypertrace#143)

* feat: support configuring distinct count agg function

* fix: restore missing sum

* fix: the integration test with latest pinot image for group.id change (hypertrace#146)

* fix: the integration test with latest pinot image for group.id change

* making it 5 mins for worst case

* updated comments

* reverted the test image

* refactor: update a test config, remove unused const

Co-authored-by: kotharironak <53209990+kotharironak@users.noreply.github.com>

* adding config from service since kube manifest plans fail

* fix: flip around division (hypertrace#147)

* kube manifests isnt working so adding service config

* removed config changes at app level

* add dependency

* add dependency

* dependency udpate

* dependency update

* build from apm

* maven aurl

* maven test

* dependency update

* file impl

* file impl

* updates workflows

Co-authored-by: Aaron Steinfeld <45047841+aaron-steinfeld@users.noreply.github.com>
Co-authored-by: kotharironak <53209990+kotharironak@users.noreply.github.com>
Co-authored-by: 13shivam <shivam.rai@razorpay.com>
Harnoor-se7en added a commit to razorpay/hypertrace-query-service that referenced this pull request Dec 12, 2022
* feat: support configuring distinct count agg function (hypertrace#143)

* feat: support configuring distinct count agg function

* fix: restore missing sum

* fix: the integration test with latest pinot image for group.id change (hypertrace#146)

* fix: the integration test with latest pinot image for group.id change

* making it 5 mins for worst case

* updated comments

* reverted the test image

* refactor: update a test config, remove unused const

Co-authored-by: kotharironak <53209990+kotharironak@users.noreply.github.com>

* adding config from service since kube manifest plans fail

* fix: flip around division (hypertrace#147)

* kube manifests isnt working so adding service config

* removed config changes at app level

Co-authored-by: Aaron Steinfeld <45047841+aaron-steinfeld@users.noreply.github.com>
Co-authored-by: kotharironak <53209990+kotharironak@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants