-
Notifications
You must be signed in to change notification settings - Fork 521
[SYSTEMDS-3253] Add combined rewrite and lop instruction for union #2286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch refines the current union operation to an internal LOP operation. Currently, two subsequent operations -- rbind() and unique(), are used to perform the union operation. We rewrite the operation with an internal LOP that uses a HashSet to compute the unique entries and returns them in a matrix. This improves the efficiency of the operation, as it avoids unique(). The order of the input entries is preserved in the output.
|
Thanks for the patch @chihsinh - could you please fix the missing license headers, revert the replacement of the detailed imports with a wildcard import, and benchmark this hash map list of double implementation against a hash map of sliced out matrix blocks. |
|
Btw, the rewrite test seems to fail because |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2286 +/- ##
============================================
- Coverage 72.96% 72.94% -0.02%
- Complexity 46097 46109 +12
============================================
Files 1479 1480 +1
Lines 172654 172757 +103
Branches 33796 33818 +22
============================================
+ Hits 125970 126025 +55
- Misses 37192 37241 +49
+ Partials 9492 9491 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
… after benchmarking
|
LGTM - thanks for the patch @chihsinh. The code looked already pretty good. During the merge, I only removed the stdout printing from the instruction, and deduplicated the core kernels for single-column union and multi-column union a bit. |
This patch refines the current union operation to an internal LOP operation. Currently, two subsequent operations -- rbind() and unique() are used to perform the union operation. We rewrite the operation with an internal LOP that uses a HashSet to compute the unique entries and returns them in a matrix. This improves the efficiency of the operation, as it avoids unique(). The order of the input entries is preserved in the output. Closes apache#2286.
This patch refines the current union operation to an internal LOP operation. Currently, two subsequent operations -- rbind() and unique() are used to perform the union operation. We rewrite the operation with an internal LOP that uses a HashSet to compute the unique entries and returns them in a matrix. This improves the efficiency of the operation, as it avoids unique(). The order of the input entries is preserved in the output. Closes apache#2286.
This patch refines the current union operation to an internal LOP operation. Currently, two subsequent operations -- rbind() and unique(), are used to perform the union operation. We rewrite the operation with an internal LOP that uses a HashSet to compute the unique entries and returns them in a matrix. This improves the efficiency of the operation, as it avoids unique(). The order of the input entries is preserved in the output.