Skip to content

Add emplace_back_unchecked#49

Closed
Andersama wants to merge 1 commit intomartinus:mainfrom
Andersama:main
Closed

Add emplace_back_unchecked#49
Andersama wants to merge 1 commit intomartinus:mainfrom
Andersama:main

Conversation

@Andersama
Copy link

I don't expect this to be accepted, however I actually wrote a short paper on this a few years ago. I tend to add this utility and another for modifying the size to my own custom vectors to be able to get every last bit of performance where I can. You can get an easy 100% performance swing with this over emplace_back.

Even better is actually this pattern:

std::vector<char> a_pseudo_string;
a_pseudo_string.reserve(32);

char *danger_ptr = a_pseudo_string.data();
for (size_t i = 0; i < 32; i++) {
  danger_ptr[i] = i+32;
}
// an incredibly awkward use of assign
a_psudo_string.assign(danger_ptr, danger_ptr+32);

Where assign also can get an unchecked variation. Since the internals of svector already has

    template <direction D>
    void set_size(size_t s) {
        if constexpr (D == direction::direct) {
            set_direct_and_size(s);
        } else {
            indirect()->size(s);
        }
    }

The second is already possible*.

Alternatively could be named unchecked_emplace_back.

@Philippe91
Copy link

I think this is what David Stone proposes ("append_from_capacity") in his CppCon talk

@martinus
Copy link
Owner

I googled a bit, I think an API like here in P1010R1 sounds good to me:

  • T* uninitialized_data() - Returns a pointer to storage that would back elements [size(), capacity()).
  • insert_from_capacity(size_type n) Appends n elements from capacity. The application must have initialized the storage backing these elements otherwise the behavior is undefined. (or name it append_from_capacity)

@Andersama
Copy link
Author

Andersama commented Apr 17, 2023

It's not a new idea, I was about to submit my paper to the standards committee but someone mentioned that a similar idea was rejected years beforehand. I'm not surprised though that it keeps getting brought up, it's a pretty easy performance improvement when you know you can use it.

insert_from_capacity's a strange name since insert's used to place data essentially "anywhere". I tend to call mine append_unchecked or unchecked_append since for whatever reason append is not defined for vector, but is for strings.

Here's some old nanobench results from my draft from 2 years ago.

| MSVC x64 debug
|               ns/op |                op/s |    err% |     total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               91.60 |       10,917,030.57 |    0.5% |      0.00 | `emplace_back`
|               31.82 |       31,421,838.18 |    0.4% |      0.00 | `unchecked_emplace_back`
|                9.04 |      110,663,983.90 |    3.4% |      0.00 | `assign`
|                8.65 |      115,606,936.42 |    0.1% |      0.00 | `self_assign`

| MSVC x64 release
|               ns/op |                op/s |    err% |     total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                0.97 |    1,026,392,961.88 |    0.1% |      0.00 | `emplace_back`
|                0.89 |    1,126,637,554.59 |    0.1% |      0.00 | `unchecked_emplace_back`
|                0.57 |    1,741,245,136.19 |    0.1% |      0.00 | `assign`
|                0.53 |    1,899,335,232.67 |    0.0% |      0.00 | `self_assign`

| Clang x64 debug
|               ns/op |                op/s |    err% |     total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               21.06 |       47,483,380.82 |    0.1% |      0.00 | `emplace_back`
|               20.40 |       49,019,607.84 |    0.1% |      0.00 | `unchecked_emplace_back`
|                2.78 |      359,640,359.64 |    0.1% |      0.00 | `assign`
|                2.71 |      369,290,573.37 |    0.0% |      0.00 | `self_assign`

| Clang x64 release
|               ns/op |                op/s |    err% |     total | emplace_back benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.77 |      564,440,263.41 |    0.1% |      0.00 | `emplace_back`
|                0.80 |    1,256,544,502.62 |    0.2% |      0.00 | `unchecked_emplace_back`
|                0.55 |    1,803,127,874.89 |    0.0% |      0.00 | `assign`
|                0.53 |    1,873,767,258.38 |    0.0% |      0.00 | `self_assign`

It somewhat depends on the complexity of the type of course, but as you can see using assign's WAY faster. I think the strange result was actually that MSVC seemingly out optimized clang at the time for regular emplace_back. I can't remember why I think it was that msvc managed to optimize with SIMD instructions, when the conditional check was removed from both I think the optimizers got a fairly decent shot optimizing the append.

@martinus
Copy link
Owner

Nice nanobench benchmarks @Andersama 😄 I also don't like insert_from_capacity. Digging more into standardization, the last status of P1010 is that it should be revised, based on the accepted resize_and_overwrite: cplusplus/papers#322

This will be in C++23 for std::string, and It would probably be best to have the same API for vector:
https://en.cppreference.com/w/cpp/string/basic_string/resize_and_overwrite

@Andersama
Copy link
Author

Andersama commented Apr 19, 2023

Taking a look at the cppreference example

food.resize_and_overwrite(
        resize_to,
        [food_size = food.size()](char* p, [std::size_t](https://en.cppreference.com/w/cpp/types/size_t) n) noexcept -> [std::size_t](https://en.cppreference.com/w/cpp/types/size_t) {
            // p[0]..p[n] is the assignable range
            // p[0]..p[min(n, food_size) - 1] is the readable range
            // (contents initially equal to the original string)
 
            // Debug print:
            [std::cout](https://en.cppreference.com/w/cpp/io/cout) << "In Operation(); n: " << n << '\n';
 
            // Copy fruits to the buffer p while there is enough space.
            char* first = p + food_size;
 
            for (char* const end = p + n; const [std::string_view](https://en.cppreference.com/w/cpp/string/basic_string_view) fruit : fruits) {
                char* last = first + fruit.size() + 1;
                if (last > end)
                    break;
                *first++ = ' ';
                std::[ranges::copy](https://en.cppreference.com/w/cpp/algorithm/ranges/copy)(fruit, first);
                first = last;
            }
 
            const auto final_size { static_cast<[std::size_t](https://en.cppreference.com/w/cpp/types/size_t)>(first - p) };
 
            // Debug print:
            [std::cout](https://en.cppreference.com/w/cpp/io/cout) << "In Operation(); final_size: " << final_size << '\n';
 
            [assert](https://en.cppreference.com/w/cpp/error/assert)(final_size <= n);
            return final_size; // Return value is the actual new length
                               // of the string, must be in range 0..n
        });

I have the distinct feeling the signature is confusing. If I were to write the callback signature I'd include the old size in the parameters, or alternatively pass the original vector into the callback. It looks like the current signature is passing the entire range via the pointer and the new capacity essentially forcing the user to create a lambda to capture the size.

I'd consider for example:

food.lazy_append(new_capacity, [](char *ptr, size_t old_size, size_t new_capacity){
// now ptr[0..old_size-1] is still available but we don't have to be capturing it ourselves
// bonus we can use c functions as a callback meaning this'd be backwards compatible with older c++ standards and could be done without a template
});

//bonus, the implementation's probably the simplest thing I can think of
void lazy_append(size_t new_capacity, void(*callback)(value_type *, size_t, size_t)) {
   reserve(new_capacity);
   callback(data(), size(), capacity());
}

@Andersama
Copy link
Author

Now that I'm thinking about it, since I gaffed on the initially commit, this kind of vector implementation should likely use a different variation of unchecked emplace back given which mode the vector's in.

@Andersama
Copy link
Author

Sorry I haven't touched this in over a year, been busy. I'll make a different pull request for resize_and_overwrite.

@Andersama Andersama closed this May 20, 2024
Andersama added a commit to Andersama/svector that referenced this pull request May 21, 2024
Following feedback from martinus#49 I've written `resize_and_overwrite` using https://en.cppreference.com/w/cpp/string/basic_string/resize_and_overwrite as a guide.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants