ARROW-1073: C++: Adapative integer builder #723

xhochy · 2017-06-02T14:05:44Z

No description provided.

wesm

This will be really nice to have!

We should do a performance shootout against https://github.com/wiseio/paratext/blob/master/src/util/widening_vector.hpp at some point

wesm · 2017-06-06T18:58:09Z

cpp/src/arrow/builder.cc

static inline here maybe?

See below; inline is mostly these days only about linkage and visibility and not so much about if this function should really be inlined.

wesm · 2017-06-06T19:09:15Z

cpp/src/arrow/builder.cc

can this be replaced by a bit trick, like val & ~(std::numeric_limits<int32_t>::max() - 1)?

It could be but I'm neither sure which pattern to use and benchmarks are showing that this is not really relevant performance-wise.

wesm · 2017-06-06T19:11:45Z

cpp/src/arrow/builder.cc

Is std::transform the same performance as a for loop?

In the background it should actually be that. The only differentiation to a plain for-loop except the savings in lines of code is that with C++17 you could have the compiler/STL run the loop multi-threaded.

wesm · 2017-06-06T19:12:33Z

cpp/src/arrow/builder.cc

unchecked status

wesm · 2017-06-06T19:16:08Z

cpp/src/arrow/builder.cc

I'm curious if a helper template works here also

struct is_smaller { constexpr bool value = sizeof(old_type) < sizeof(new_type); }

There is std::less but that also didn't work here.

wesm · 2017-06-06T20:42:57Z

cpp/src/arrow/builder.cc

Release builds?

wesm · 2017-06-06T20:43:05Z

cpp/src/arrow/builder.cc

wesm · 2017-06-06T20:43:25Z

cpp/src/arrow/builder.cc

DCHECK or only error status?

wesm · 2017-06-06T20:44:18Z

cpp/src/arrow/builder.cc

static inline?

As this is a private method, this should not be needed, the compiler should automatically take care of inlining.

wesm · 2017-06-06T20:47:56Z

cpp/src/arrow/builder.cc

It might be overkill, but you could potentially collapse these classes with an extra template parameter that switches between signed and unsigned variants. So something like

class AdaptiveIntBuilder : AdaptiveIntBuilderBase<AdaptiveIntBuilder>

and then you could define various static attributes on AdaptiveIntBuilder

I though about that and did not see yet how this would reduce code complexity while keeping it understandable.

xhochy · 2017-06-17T07:11:54Z

Fixed an unchecked status and added a benchmark for unsigned integers. Performance-wise, the builders are clearly dominated by the expand & copy logic, thus I opened a JIRA to discuss if we should make jemalloc the default. I would leave the DCHECKs as-is because they only are check internal things and the code paths are actually unreachable but there so that the compiler does not complain.

wesm · 2017-06-19T15:43:50Z

cpp/src/arrow/builder.h

Does inlining scalar appends make much of a perf impact?

wesm · 2017-06-21T15:01:09Z

I think this is OK except for me comment about inlining. I suppose I can play with the benchmarks after this goes in

xhochy · 2017-06-21T16:42:02Z

Benchmarks about the scalar append:

BM_BuildAdaptiveIntNoNulls/repeats:3                            58 ms         57 ms          9      1114MB/s
BM_BuildAdaptiveIntNoNulls/repeats:3                            57 ms         57 ms          9    1.1025GB/s
BM_BuildAdaptiveIntNoNulls/repeats:3                            57 ms         57 ms          9   1.10328GB/s
BM_BuildAdaptiveIntNoNulls/repeats:3_mean                       57 ms         57 ms          9   1124.24MB/s
BM_BuildAdaptiveIntNoNulls/repeats:3_stddev                      0 ms          0 ms          0   7.24806MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3               115 ms        114 ms          6   560.072MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3               114 ms        114 ms          6   562.683MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3               114 ms        114 ms          6   563.081MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3_mean          114 ms        114 ms          6   561.945MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3_stddev          0 ms          0 ms          0   1.33449MB/s
-- Append inlined --
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3                80 ms         80 ms          9   799.207MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3                80 ms         80 ms          9   800.331MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3                83 ms         82 ms          9   782.643MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3_mean           81 ms         81 ms          9   794.061MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3_stddev          1 ms          1 ms          0   8.08636MB/s

So it seems inlining it makes a significant difference. I will make a clean implementation of that.

Change-Id: I5309b506174c2fcafd6c168069fa81a5af4122fb

xhochy · 2017-06-22T14:21:02Z

Green Appveyor build: https://ci.appveyor.com/project/xhochy/arrow/build/1.0.372

wesm

+1, thanks @xhochy!

wesm · 2017-06-22T21:36:59Z

cpp/src/arrow/builder-benchmark.cc

+  while (state.KeepRunning()) {
+    AdaptiveIntBuilder builder(default_memory_pool());
+    for (int64_t i = 0; i < size; i++) {
+      builder.Append(data[i]);


To make this benchmark more "realistic" we should add an assertion about the result of Append being OK

xhochy changed the title ~~[WIP] ARROW-1073: C++: Adapative integer builder~~ ARROW-1073: C++: Adapative integer builder Jun 5, 2017

wesm reviewed Jun 6, 2017

View reviewed changes

xhochy force-pushed the ARROW-1073 branch from 8cc3afc to cdea9c0 Compare June 17, 2017 07:08

wesm reviewed Jun 19, 2017

View reviewed changes

cpp/src/arrow/builder.h Outdated

Copy link

Member

wesm Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does inlining scalar appends make much of a perf impact?

ARROW-1073: C++: Adapative integer builder

5bab9c2

Change-Id: I5309b506174c2fcafd6c168069fa81a5af4122fb

xhochy force-pushed the ARROW-1073 branch from aa77099 to 5bab9c2 Compare June 22, 2017 07:20

wesm approved these changes Jun 22, 2017

View reviewed changes

asfgit closed this in 608b89e Jun 22, 2017

wesm reviewed Jun 22, 2017

View reviewed changes

ARROW-1073: C++: Adapative integer builder #723

ARROW-1073: C++: Adapative integer builder #723

Uh oh!

Conversation

xhochy commented Jun 2, 2017

Uh oh!

wesm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xhochy commented Jun 17, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wesm commented Jun 21, 2017

Uh oh!

xhochy commented Jun 21, 2017

Uh oh!

xhochy commented Jun 22, 2017

Uh oh!

wesm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants