Ensure Context is noncopyable by springmeyer · Pull Request #93 · mapbox/carmen-cache

springmeyer · 2017-04-01T01:17:23Z

This fixes an odd case (still need to research more into the why here) where Context objects were still being copied rather than moved (even though they had a move constructor enabled).

By moving these large objects we should avoid many allocations and help speed up performance.

On OS X this leads to a significant speedup. On Linux the speedup is less but still noticable.

OSX shows:

Master

# coalesceMulti
ok 117 coalesceMulti @ 7.56ms
# coalesceMulti proximity
ok 118 coalesceMulti + proximity @ 7.44ms

This branch

ok 117 coalesceMulti @ 6.62ms
# coalesceMulti proximity
ok 118 coalesceMulti + proximity @ 6.46ms

springmeyer · 2017-04-05T23:10:47Z

Latest results with std::partial_sort on OS X:

TAP version 13
# coalesceSingle
ok 1 coalesceSingle @ 0.42ms
# coalesceSingle proximity
ok 2 coalesceSingle + proximity @ 0.52ms
# coalesceMulti
ok 3 coalesceMulti @ 4.84ms
# coalesceMulti proximity
ok 4 coalesceMulti + proximity @ 4.78ms

coveralls · 2017-04-05T23:17:39Z

Coverage increased (+0.2%) to 78.903% when pulling 83b555d on move-only-context into 93652b2 on master.

springmeyer · 2017-04-05T23:22:29Z

Because the partial sort speeds things up so much I took a look at profiling results with 45c91ed and after partial sort (83b555d) on OS X.

In both cases, for one thread sampled, 22% of the thread is busy in coalesceMulti (This 22% makes sense since the test is run serially). But the speedup with partial sort is obvious because it moves the sort from being where the largest time taken to only 4th most expensive.

Before:

7.2% is taken in std::sort of ContextSortByRelev
5.3% is taken in deallocation of memory (free)
3.8% is taken in allocation of memory (malloc/new)
2.2% is taken in the std::sort in __getmatching
Everything else is less than 1%

After:

5.9% is taken in deallocation of memory (free)
5.1% is taken in allocation of memory (malloc/new)
3.8% is taken in the std::sort in __getmatching
1.3% is taken in std::partial_sort of ContextSortByRelev
Everything else is less than 1%

apendleton · 2017-04-05T23:37:35Z

Sidenote as far as next steps here: __getmatching is actually overloaded and has two implementations, one for rocksdb and one for the in-memory cache. We only actually use the former in production, and only the latter calls std::sort (the rocksdb version sorts lazily). We may want to adjust these benchmarks accordingly as I wonder if the performance characteristics will be different... I expect that the rocksdb implementation may well be slower because it doesn't hold everything in ram, though, so it might make benchmarks a little noisier.

springmeyer · 2017-04-05T23:43:23Z

Thanks @apendleton - per chat I've love help on adjusting the benchmark to test the right thing. Feel free to commit to this branch or another with any fixes there.

The other next steps here I see are:

Figure out if the partial_sort can be done on a fixed, predictable N. Currently we depend on filtering to a max of 40 after sorting and so the partial_sort needs to run on more than 40 values. This is because we further filter after sorting here and here. If we could dedupe features before sorting that would allow the sort to be much faster.
Test in carmen and see if the perf gains above also translate to production before spending too much more time here.

springmeyer · 2017-04-06T00:37:57Z

Noting that while OS X results are impressive, results on linux/travis are not. So I need to see if travis is lying (very possible) or perhaps libstdc++ (defalt standard library on linux) is not as optimized for partial_sort as libc++ (OSX):

master:

# coalesceSingle
ok 115 coalesceSingle @ 0.48ms
# coalesceSingle proximity
ok 116 coalesceSingle + proximity @ 0.68ms
# coalesceMulti
ok 117 coalesceMulti @ 7.56ms
# coalesceMulti proximity
ok 118 coalesceMulti + proximity @ 7.44ms

`45c91ed` (this branch, before partial_sort)

# coalesceSingle
ok 115 coalesceSingle @ 0.5ms
# coalesceSingle proximity
ok 116 coalesceSingle + proximity @ 0.68ms
# coalesceMulti
ok 117 coalesceMulti @ 7.22ms
# coalesceMulti proximity
ok 118 coalesceMulti + proximity @ 7.52ms

`83b555d` (this branch, after partial_sort)

# coalesceSingle
ok 115 coalesceSingle @ 0.82ms
# coalesceSingle proximity
ok 116 coalesceSingle + proximity @ 0.84ms
# coalesceMulti
ok 117 coalesceMulti @ 5.68ms
# coalesceMulti proximity
ok 118 coalesceMulti + proximity @ 7.44ms

mourner · 2017-04-06T14:21:20Z

If we could dedupe features before sorting that would allow the sort to be much faster.

Yes! Also note that for deduping, the existing code in coalesceFinalize uses std::map. Since we're deduping by a uint id, using std::unordered_set should be much faster since insertion/lookup is constant time on average.

springmeyer · 2017-04-28T21:08:07Z

Update on my thinking on this PR:

The non-copyable fixes are 👍. While they may not show huge perf gains in local testing they should reduce allocations and therefore should help most under high load
The partial_sort approach feels very promising and getting close but not quite right. The next actions at Ensure Context is noncopyable #93 (comment) still stand. Ideally the partial sort could be done on a fixed N. So, we should - as @mourner - mentions revisit how deduping works.

I'm going to pause here since revisit how deduping works is above my ability to focus on the code right now. I would be happy to spirit guide someone who wants to pick this up.

…rting

springmeyer · 2017-05-23T00:50:41Z

I've removed the experimental/not-quite-yet-ideal partial_sort code from this PR in 57ec8fc.

This means that I think this is ready to be tested in production since the remaining changes are clear improvements.

Avoiding double move of std::unique_ptr<rocksdb::DB> (and a compiler warning on pessimizing move)
Deleting the automatic-compiler-generated constructor, copy constructor, and assignment constructor of the Context class. This ensures that the Context is truly "noncopyable" and therefore only movable. This will save on memory allocations on linux where otherwise the object was being copied rather than moved.
Avoiding reserving too much memory when calling context.reserve and there were less than 40 contexts to pick from.

/cc @KaiBot3000 @aarthykc - would you be interested in running this out? My hope is that this will help reduce memory usage in production and therefore might also help performance. The steps would be:

Tweak the version in package.json to something unique for testing like 0.18.0-moveonly
Publish C++ binaries: `git commit -am "[publish binary]"
Test in carmen and if things look good
Stage, and if things look good
Run out an official release of carmen-cache by incrementing the version to 0.19.0 and then:
- Publishing binaries again
- Then running npm publish
- Updating carmen to 0.19.0 of carmen cache

If #96 is looking good, this could also be merged into master and tested alongside that as it is deployed.

springmeyer · 2018-02-23T21:22:53Z

Note, after #116 lands I think it would be a good time to revisit this and get it landed. Basically making sure that the Context is noncopyable to ensure it is moved by the compiler in all cases:

+    Context() = delete;
+    Context(Context const& c) = delete;
+    Context& operator=(Context const& c) = delete;

Also as a future idea/ticket: would be great to apply learnings on sorting optimizations from vtquery. Whoever wants to pick this up reach out to @mapsam for tips on sorting optimizations in C++.

apendleton · 2018-03-13T17:14:15Z

We made Context noncopyable in #116 . @springmeyer closing as I think ultimately that's all we wanted to carry over from here, but if there was other stuff from the earlier iterations of this PR that you wanted us to do as well, feel free to reopen.

springmeyer · 2018-03-13T22:31:36Z

Sounds good @apendleton - that should help. I've ticketed #120 to log the idea of using partial_sort in the future.

springmeyer changed the title ~~Ensure Context is noncopyable~~ Ensure Context is noncopyable + partial_sort Apr 5, 2017

Dane Springmeyer added 5 commits May 22, 2017 17:32

ensure without a doubt that Contexts are only movable, not copyable

2ed4098

Fix -Wpessimizing-move warning

8d9aefd

avoid overallocation of features given 40 max cutoff of contexts

998144b

avoid overallocation of contexts from covers

2ecf16d

use std::partial_sort

7ce6f0c

springmeyer force-pushed the move-only-context branch from 83b555d to 7ce6f0c Compare May 23, 2017 00:32

rollback partial_sort for now (can add this optimization later)

57ec8fc

springmeyer changed the title ~~Ensure Context is noncopyable + partial_sort~~ Ensure Context is noncopyable May 23, 2017

fix for calculating max_contexts now that we are no longer partial so…

63b3de1

…rting

apendleton closed this Mar 13, 2018

springmeyer deleted the move-only-context branch March 13, 2018 22:18

springmeyer mentioned this pull request Mar 13, 2018

Speed up sorting / reduce overhead of sorting #120

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure Context is noncopyable#93

Ensure Context is noncopyable#93
springmeyer wants to merge 7 commits intomasterfrom
move-only-context

springmeyer commented Apr 1, 2017 •

edited

Loading

Uh oh!

springmeyer commented Apr 5, 2017 •

edited

Loading

Uh oh!

coveralls commented Apr 5, 2017

Uh oh!

springmeyer commented Apr 5, 2017 •

edited

Loading

Uh oh!

apendleton commented Apr 5, 2017

Uh oh!

springmeyer commented Apr 5, 2017 •

edited

Loading

Uh oh!

springmeyer commented Apr 6, 2017

Uh oh!

mourner commented Apr 6, 2017

Uh oh!

springmeyer commented Apr 28, 2017

Uh oh!

springmeyer commented May 23, 2017

Uh oh!

springmeyer commented Feb 23, 2018

Uh oh!

apendleton commented Mar 13, 2018

Uh oh!

springmeyer commented Mar 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

springmeyer commented Apr 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Master

This branch

Uh oh!

springmeyer commented Apr 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Apr 5, 2017

Uh oh!

springmeyer commented Apr 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apendleton commented Apr 5, 2017

Uh oh!

springmeyer commented Apr 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

springmeyer commented Apr 6, 2017

master:

45c91ed (this branch, before partial_sort)

83b555d (this branch, after partial_sort)

Uh oh!

mourner commented Apr 6, 2017

Uh oh!

springmeyer commented Apr 28, 2017

Uh oh!

springmeyer commented May 23, 2017

Uh oh!

springmeyer commented Feb 23, 2018

Uh oh!

apendleton commented Mar 13, 2018

Uh oh!

springmeyer commented Mar 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

springmeyer commented Apr 1, 2017 •

edited

Loading

springmeyer commented Apr 5, 2017 •

edited

Loading

springmeyer commented Apr 5, 2017 •

edited

Loading

springmeyer commented Apr 5, 2017 •

edited

Loading

`45c91ed` (this branch, before partial_sort)

`83b555d` (this branch, after partial_sort)