Conversation
|
I think this is what David Stone proposes ("append_from_capacity") in his CppCon talk |
|
I googled a bit, I think an API like here in P1010R1 sounds good to me:
|
|
It's not a new idea, I was about to submit my paper to the standards committee but someone mentioned that a similar idea was rejected years beforehand. I'm not surprised though that it keeps getting brought up, it's a pretty easy performance improvement when you know you can use it.
Here's some old nanobench results from my draft from 2 years ago. | MSVC x64 debug
| ns/op | op/s | err% | total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 91.60 | 10,917,030.57 | 0.5% | 0.00 | `emplace_back`
| 31.82 | 31,421,838.18 | 0.4% | 0.00 | `unchecked_emplace_back`
| 9.04 | 110,663,983.90 | 3.4% | 0.00 | `assign`
| 8.65 | 115,606,936.42 | 0.1% | 0.00 | `self_assign`
| MSVC x64 release
| ns/op | op/s | err% | total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 0.97 | 1,026,392,961.88 | 0.1% | 0.00 | `emplace_back`
| 0.89 | 1,126,637,554.59 | 0.1% | 0.00 | `unchecked_emplace_back`
| 0.57 | 1,741,245,136.19 | 0.1% | 0.00 | `assign`
| 0.53 | 1,899,335,232.67 | 0.0% | 0.00 | `self_assign`
| Clang x64 debug
| ns/op | op/s | err% | total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 21.06 | 47,483,380.82 | 0.1% | 0.00 | `emplace_back`
| 20.40 | 49,019,607.84 | 0.1% | 0.00 | `unchecked_emplace_back`
| 2.78 | 359,640,359.64 | 0.1% | 0.00 | `assign`
| 2.71 | 369,290,573.37 | 0.0% | 0.00 | `self_assign`
| Clang x64 release
| ns/op | op/s | err% | total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 1.77 | 564,440,263.41 | 0.1% | 0.00 | `emplace_back`
| 0.80 | 1,256,544,502.62 | 0.2% | 0.00 | `unchecked_emplace_back`
| 0.55 | 1,803,127,874.89 | 0.0% | 0.00 | `assign`
| 0.53 | 1,873,767,258.38 | 0.0% | 0.00 | `self_assign`It somewhat depends on the complexity of the type of course, but as you can see using assign's WAY faster. I think the strange result was actually that MSVC seemingly out optimized clang at the time for regular |
|
Nice nanobench benchmarks @Andersama 😄 I also don't like This will be in C++23 for std::string, and It would probably be best to have the same API for vector: |
|
Taking a look at the cppreference example food.resize_and_overwrite(
resize_to,
[food_size = food.size()](char* p, [std::size_t](https://en.cppreference.com/w/cpp/types/size_t) n) noexcept -> [std::size_t](https://en.cppreference.com/w/cpp/types/size_t) {
// p[0]..p[n] is the assignable range
// p[0]..p[min(n, food_size) - 1] is the readable range
// (contents initially equal to the original string)
// Debug print:
[std::cout](https://en.cppreference.com/w/cpp/io/cout) << "In Operation(); n: " << n << '\n';
// Copy fruits to the buffer p while there is enough space.
char* first = p + food_size;
for (char* const end = p + n; const [std::string_view](https://en.cppreference.com/w/cpp/string/basic_string_view) fruit : fruits) {
char* last = first + fruit.size() + 1;
if (last > end)
break;
*first++ = ' ';
std::[ranges::copy](https://en.cppreference.com/w/cpp/algorithm/ranges/copy)(fruit, first);
first = last;
}
const auto final_size { static_cast<[std::size_t](https://en.cppreference.com/w/cpp/types/size_t)>(first - p) };
// Debug print:
[std::cout](https://en.cppreference.com/w/cpp/io/cout) << "In Operation(); final_size: " << final_size << '\n';
[assert](https://en.cppreference.com/w/cpp/error/assert)(final_size <= n);
return final_size; // Return value is the actual new length
// of the string, must be in range 0..n
});I have the distinct feeling the signature is confusing. If I were to write the callback signature I'd include the old size in the parameters, or alternatively pass the original vector into the callback. It looks like the current signature is passing the entire range via the pointer and the new capacity essentially forcing the user to create a lambda to capture the size. I'd consider for example: food.lazy_append(new_capacity, [](char *ptr, size_t old_size, size_t new_capacity){
// now ptr[0..old_size-1] is still available but we don't have to be capturing it ourselves
// bonus we can use c functions as a callback meaning this'd be backwards compatible with older c++ standards and could be done without a template
});
//bonus, the implementation's probably the simplest thing I can think of
void lazy_append(size_t new_capacity, void(*callback)(value_type *, size_t, size_t)) {
reserve(new_capacity);
callback(data(), size(), capacity());
} |
|
Now that I'm thinking about it, since I gaffed on the initially commit, this kind of vector implementation should likely use a different variation of unchecked emplace back given which mode the vector's in. |
|
Sorry I haven't touched this in over a year, been busy. I'll make a different pull request for |
Following feedback from martinus#49 I've written `resize_and_overwrite` using https://en.cppreference.com/w/cpp/string/basic_string/resize_and_overwrite as a guide.
I don't expect this to be accepted, however I actually wrote a short paper on this a few years ago. I tend to add this utility and another for modifying the size to my own custom vectors to be able to get every last bit of performance where I can. You can get an easy 100% performance swing with this over emplace_back.
Even better is actually this pattern:
Where
assignalso can get an unchecked variation. Since the internals ofsvectoralready hasThe second is already possible*.
Alternatively could be named
unchecked_emplace_back.