Skip to content

Conversation

@js8544
Copy link
Contributor

@js8544 js8544 commented Jun 14, 2023

Rationale for this change

Performance improvement for scalar lookup functions.

What changes are included in this PR?

  1. Reserve space for hashtable for scalar lookup functions before the insertion process.
  2. A benchmark that demonstrates the improvement. Data is pasted in the comments.

Are these changes tested?

By existing unit tests.

Are there any user-facing changes?

No.

@js8544 js8544 requested a review from westonpace as a code owner June 14, 2023 15:54
@github-actions
Copy link

⚠️ GitHub issue #36059 has been automatically assigned in GitHub to PR creator.

@js8544
Copy link
Contributor Author

js8544 commented Jun 14, 2023

Benchmark result:
Before:

----------------------------------------------------------------------------------------
Benchmark                              Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------
IndexInInt32LargeSet/100            2883 ns         2883 ns       246728 bytes_per_second=13.2302M/s items_per_second=3.46823M/s
IndexInInt32LargeSet/1000          10322 ns        10322 ns        63804 bytes_per_second=3.6958M/s items_per_second=968.833k/s
IndexInInt32LargeSet/10000        139318 ns       139316 ns         4987 bytes_per_second=280.389k/s items_per_second=71.7795k/s
IndexInInt32LargeSet/100000      4588807 ns      4588452 ns          165 bytes_per_second=8.51322k/s items_per_second=2.17938k/s
IndexInInt32LargeSet/1000000    90538493 ns     90536500 ns           10 bytes_per_second=441.811/s items_per_second=110.453/s
IndexInInt32LargeSet/10000000 1428412088 ns   1428369286 ns            1 bytes_per_second=28.004/s items_per_second=7.00099/s
IsInInt32LargeSet/100               2833 ns         2833 ns       234016 bytes_per_second=13.4639M/s items_per_second=3.52948M/s
IsInInt32LargeSet/1000             10376 ns        10376 ns        67001 bytes_per_second=3.6765M/s items_per_second=963.773k/s
IsInInt32LargeSet/10000           148294 ns       148284 ns         4769 bytes_per_second=263.43k/s items_per_second=67.4381k/s
IsInInt32LargeSet/100000         4463505 ns      4463293 ns          147 bytes_per_second=8.75195k/s items_per_second=2.2405k/s
IsInInt32LargeSet/1000000       75740587 ns     75738687 ns           10 bytes_per_second=528.132/s items_per_second=132.033/s
IsInInt32LargeSet/10000000    1317214434 ns   1317158744 ns            1 bytes_per_second=30.3684/s items_per_second=7.5921/s

After

----------------------------------------------------------------------------------------
Benchmark                              Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------
IndexInInt32LargeSet/100            2659 ns         2658 ns       266800 bytes_per_second=14.3493M/s items_per_second=3.76159M/s
IndexInInt32LargeSet/1000           9759 ns         9759 ns        70169 bytes_per_second=3.9091M/s items_per_second=1024.75k/s
IndexInInt32LargeSet/10000         93311 ns        93309 ns         7477 bytes_per_second=418.634k/s items_per_second=107.17k/s
IndexInInt32LargeSet/100000      1729160 ns      1729162 ns          453 bytes_per_second=22.5904k/s items_per_second=5.78315k/s
IndexInInt32LargeSet/1000000    79806851 ns     79805041 ns            9 bytes_per_second=501.221/s items_per_second=125.305/s
IndexInInt32LargeSet/10000000 1112193163 ns   1112056579 ns            1 bytes_per_second=35.9694/s items_per_second=8.99235/s
IsInInt32LargeSet/100               2518 ns         2518 ns       280148 bytes_per_second=15.1526M/s items_per_second=3.97215M/s
IsInInt32LargeSet/1000              9625 ns         9624 ns        72600 bytes_per_second=3.96356M/s items_per_second=1039.02k/s
IsInInt32LargeSet/10000            93759 ns        93754 ns         7256 bytes_per_second=416.647k/s items_per_second=106.662k/s
IsInInt32LargeSet/100000         1608221 ns      1608152 ns          495 bytes_per_second=24.2903k/s items_per_second=6.21832k/s
IsInInt32LargeSet/1000000       73113133 ns     73112833 ns            9 bytes_per_second=547.1/s items_per_second=136.775/s
IsInInt32LargeSet/10000000    1207785785 ns   1207731078 ns            1 bytes_per_second=33.12/s items_per_second=8.27999/s

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question, otherwise this makes sense to me.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Jun 20, 2023
@js8544 js8544 requested a review from westonpace June 21, 2023 23:34
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 22, 2023
@js8544 js8544 requested a review from pitrou June 22, 2023 13:26
@pitrou pitrou merged commit 2ce4a38 into apache:main Jun 22, 2023
@conbench-apache-arrow
Copy link

Conbench analyzed the 6 benchmark runs on commit 2ce4a38f.

There were 9 benchmark results indicating a performance regression:

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] Performance of building up HashTable (MemoTable) in is_in kernel

3 participants