-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Description
One of the expensive operations in quantile build process is sorting of the batch of data, which happens here:
Line 90 in 325156c
| std::sort(queue.begin(), queue.begin() + qtail); |
Sorting here sorts structures by float key.
While not as trivial as for ints, it's possible to do radix sort (< O(n log n) ) over floating point numbers. float_sort from boost is one such implementation, which uses hybrid approach (both radix and comparison-based): https://www.boost.org/doc/libs/1_67_0/libs/sort/doc/html/sort/single_thread/spreadsort/sort_hpp/float_sort.html
These are results from quick experiment with boost implementation (see https://github.com/okuvshynov/xgboost/commit/7d22b0bdcef92952d44c402512523e418ab3904d in my fork):
4 Core machine, higgs dataset, 256 bins:
std::sort:
[16:32:39] ======== Monitor: DenseCuts ========
[16:32:39] Build: 16.4768s, 1 calls @ 16476767us
[16:32:39] Init: 0.033793s, 1 calls @ 33793us
float_sort:
[16:30:14] ======== Monitor: DenseCuts ========
[16:30:14] Build: 12.3589s, 1 calls @ 12358875us
[16:30:14] Init: 0.063872s, 1 calls @ 63872us
12-core machine, higgs dataset, 256 bins:
std::sort:
[16:28:16] ======== Monitor: DenseCuts ========
[16:28:16] Build: 3.7118s, 1 calls @ 3711796us
[16:28:16] Init: 0.041176s, 1 calls @ 41176us
float_sort:
[16:25:47] ======== Monitor: DenseCuts ========
[16:25:47] Build: 2.83653s, 1 calls @ 2836527us
[16:25:47] Init: 0.040441s, 1 calls @ 40441us
This will depend on number of bins. With less bins, we'll have shorter vector to sort and win will become less pronounced. It should be also helpful for exact split finding method, but I haven't really tested that.
It looks nice, but, unless we want to bring boost as a dependency, it will require careful implementation of radix sort for floats, which would handle NaNs/positive/negative values correctly.
does this sound interesting?