Skip to content

Conversation

@xhochy
Copy link
Member

@xhochy xhochy commented Jun 2, 2017

No description provided.

@xhochy xhochy changed the title [WIP] ARROW-1073: C++: Adapative integer builder ARROW-1073: C++: Adapative integer builder Jun 5, 2017
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be really nice to have!

We should do a performance shootout against https://github.com/wiseio/paratext/blob/master/src/util/widening_vector.hpp at some point

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static inline here maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below; inline is mostly these days only about linkage and visibility and not so much about if this function should really be inlined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be replaced by a bit trick, like val & ~(std::numeric_limits<int32_t>::max() - 1)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be but I'm neither sure which pattern to use and benchmarks are showing that this is not really relevant performance-wise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is std::transform the same performance as a for loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the background it should actually be that. The only differentiation to a plain for-loop except the savings in lines of code is that with C++17 you could have the compiler/STL run the loop multi-threaded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unchecked status

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if a helper template works here also

struct is_smaller {
  constexpr bool value = sizeof(old_type) < sizeof(new_type);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is std::less but that also didn't work here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release builds?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dchecks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DCHECK or only error status?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static inline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a private method, this should not be needed, the compiler should automatically take care of inlining.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be overkill, but you could potentially collapse these classes with an extra template parameter that switches between signed and unsigned variants. So something like

class AdaptiveIntBuilder : AdaptiveIntBuilderBase<AdaptiveIntBuilder>

and then you could define various static attributes on AdaptiveIntBuilder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though about that and did not see yet how this would reduce code complexity while keeping it understandable.

@xhochy
Copy link
Member Author

xhochy commented Jun 17, 2017

Fixed an unchecked status and added a benchmark for unsigned integers. Performance-wise, the builders are clearly dominated by the expand & copy logic, thus I opened a JIRA to discuss if we should make jemalloc the default. I would leave the DCHECKs as-is because they only are check internal things and the code paths are actually unreachable but there so that the compiler does not complain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does inlining scalar appends make much of a perf impact?

@wesm
Copy link
Member

wesm commented Jun 21, 2017

I think this is OK except for me comment about inlining. I suppose I can play with the benchmarks after this goes in

@xhochy
Copy link
Member Author

xhochy commented Jun 21, 2017

Benchmarks about the scalar append:

BM_BuildAdaptiveIntNoNulls/repeats:3                            58 ms         57 ms          9      1114MB/s
BM_BuildAdaptiveIntNoNulls/repeats:3                            57 ms         57 ms          9    1.1025GB/s
BM_BuildAdaptiveIntNoNulls/repeats:3                            57 ms         57 ms          9   1.10328GB/s
BM_BuildAdaptiveIntNoNulls/repeats:3_mean                       57 ms         57 ms          9   1124.24MB/s
BM_BuildAdaptiveIntNoNulls/repeats:3_stddev                      0 ms          0 ms          0   7.24806MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3               115 ms        114 ms          6   560.072MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3               114 ms        114 ms          6   562.683MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3               114 ms        114 ms          6   563.081MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3_mean          114 ms        114 ms          6   561.945MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3_stddev          0 ms          0 ms          0   1.33449MB/s
-- Append inlined --
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3                80 ms         80 ms          9   799.207MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3                80 ms         80 ms          9   800.331MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3                83 ms         82 ms          9   782.643MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3_mean           81 ms         81 ms          9   794.061MB/s
BM_BuildAdaptiveIntNoNullsScalarAppend/repeats:3_stddev          1 ms          1 ms          0   8.08636MB/s

So it seems inlining it makes a significant difference. I will make a clean implementation of that.

Change-Id: I5309b506174c2fcafd6c168069fa81a5af4122fb
@xhochy
Copy link
Member Author

xhochy commented Jun 22, 2017

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks @xhochy!

@asfgit asfgit closed this in 608b89e Jun 22, 2017
while (state.KeepRunning()) {
AdaptiveIntBuilder builder(default_memory_pool());
for (int64_t i = 0; i < size; i++) {
builder.Append(data[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this benchmark more "realistic" we should add an assertion about the result of Append being OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants