Skip to content

Conversation

@mapleFU
Copy link
Member

@mapleFU mapleFU commented Sep 9, 2023

Rationale for this change

Add benchmark for DELTA_BYTE_ARRAY in parquet, and do tiny optimization.

What changes are included in this PR?

Add benchmark for DELTA_BYTE_ARRAY in parquet, and do tiny optimization.

Are these changes tested?

no

Are there any user-facing changes?

no

@mapleFU mapleFU requested a review from wgtmac as a code owner September 9, 2023 09:03
buffer[i].ptr = data_ptr;
buffer[i].len += prefix_len_ptr[i];
data_ptr += buffer[i].len;
// If the prefix length is zero, the prefix can be ignored.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an (maybe-unused) optimization. When prefix == 0, it avoid round of memcpy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make benchmarks better or worse?

Copy link
Member Author

@mapleFU mapleFU Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better, of course, specially when so many prefix == 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou I can separate a pr for this. I think we can optimize two cases:

  1. Prefix == 0
  2. Posfix == 0

Each of the case can be well optimize to avoid copying and memory allocation. I can separate for that

@github-actions github-actions bot added the awaiting review Awaiting review label Sep 9, 2023
@github-actions
Copy link

github-actions bot commented Sep 9, 2023

⚠️ GitHub issue #37293 has been automatically assigned in GitHub to PR creator.

std::vector<ByteArray> values;
std::vector<uint8_t> buf(max_length * array_size);
values.resize(array_size);
prefixed_random_byte_array(array_size, /*seed=*/0, buf.data(), values.data(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know should we test different prefix configurations
cc @pitrou @wgtmac

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMIW, /*prefixed_probability=*/0.5 is a little bit low which means we can only expect two consecutive string values to share prefixes. It can easily generate a sequence that does not have good prefixes.

Copy link
Member

@rok rok Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be interesting to also vary prefix length (max_size)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's enough to vary prefixed_probability. The benchmarking library supports integer parameters, so it can for example be passed as a percentage. Two or three values should be enough (for example 10,90,99).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, though, we should try to generate data for which the encoding is very space-efficient. Is it the case here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should try to generate data for which the encoding is very space-efficient. Is it the case here?

Would 99 percent be "space-efficient"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I got it, let me mock some tests here.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 9, 2023
@mapleFU
Copy link
Member Author

mapleFU commented Sep 9, 2023

cc @pitrou @rok @wgtmac

I've do a basic test, when it's pure random the speed is far away from Plain and other ByteArray Encoding( Though we might get space optimization). I don't know should I test more prefix settings.

}
}
PARQUET_THROW_NOT_OK(buffered_data_->Resize(data_size));
PARQUET_THROW_NOT_OK(buffered_data_->Resize(data_size, false));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PARQUET_THROW_NOT_OK(buffered_data_->Resize(data_size, false));
PARQUET_THROW_NOT_OK(buffered_data_->Resize(data_size, /*shrink_to_fit=*/false));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we lose the chance to release bulk memory if shrink_to_fit is false. Changing the default behavior of buffered_prefix_length_ above may not make big difference but the actual size of buffered_data_ may vary a lot during the decoding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me leave it first, maybe we need a better method to release memory?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Sep 11, 2023
@rok
Copy link
Member

rok commented Sep 11, 2023

I've do a basic test, when it's pure random the speed is far away from Plain and other ByteArray Encoding( Though we might get space optimization). I don't know should I test more prefix settings.

Slower encoding is expected, what is the performance difference? 2x, 10x, 100x?

@mapleFU
Copy link
Member Author

mapleFU commented Sep 11, 2023

Slower encoding is expected, what is the performance difference? 2x, 10x, 100x?

Oh my fault. When I use 8-byte and pure random, the others are about 5x. I guess thats mainly because memcpy and more checking. I'll test more ranges this week

@pitrou
Copy link
Member

pitrou commented Sep 13, 2023

Here are some results here.

Decoding doesn't look bad, it's on par with dict decoding (in items/s, I'm not sure the bytes/s figure is consistently computed):

BM_DeltaDecodingByteArray/max-string-length:8/batch-size:512                       3444 ns         3443 ns       201688 bytes_per_second=580.339M/s items_per_second=148.719M/s
BM_DeltaDecodingByteArray/max-string-length:64/batch-size:512                      3755 ns         3755 ns       183626 bytes_per_second=4.13331G/s items_per_second=136.369M/s
BM_DeltaDecodingByteArray/max-string-length:1024/batch-size:512                    5380 ns         5380 ns       129118 bytes_per_second=47.3178G/s items_per_second=95.1757M/s
BM_DeltaDecodingByteArray/max-string-length:8/batch-size:2048                     14293 ns        14291 ns        48895 bytes_per_second=550.606M/s items_per_second=143.306M/s
BM_DeltaDecodingByteArray/max-string-length:64/batch-size:2048                    15597 ns        15595 ns        44892 bytes_per_second=3.93516G/s items_per_second=131.323M/s
BM_DeltaDecodingByteArray/max-string-length:1024/batch-size:2048                  23939 ns        23934 ns        29173 bytes_per_second=41.7182G/s items_per_second=85.5671M/s

BM_DictDecodingByteArray/max-string-length:8/batch-size:512                        2953 ns         2952 ns       234887 bytes_per_second=1.29091G/s items_per_second=173.433M/s
BM_DictDecodingByteArray/max-string-length:64/batch-size:512                       4158 ns         4157 ns       169321 bytes_per_second=4.13726G/s items_per_second=123.171M/s
BM_DictDecodingByteArray/max-string-length:1024/batch-size:512                    10668 ns        10666 ns        65659 bytes_per_second=23.0956G/s items_per_second=48.0035M/s
BM_DictDecodingByteArray/max-string-length:8/batch-size:2048                       8789 ns         8788 ns        75111 bytes_per_second=1.7316G/s items_per_second=233.051M/s
BM_DictDecodingByteArray/max-string-length:64/batch-size:2048                     13147 ns        13144 ns        54072 bytes_per_second=5.20906G/s items_per_second=155.814M/s
BM_DictDecodingByteArray/max-string-length:1024/batch-size:2048                   46555 ns        46544 ns        15010 bytes_per_second=21.0795G/s items_per_second=44.0014M/s

@mapleFU
Copy link
Member Author

mapleFU commented Sep 15, 2023

BM_DeltaEncodingByteArray/min-prefix-string-length:0/max-string-length:8/batch-size:512/prefixed-probability:10             11626 ns        11623 ns        59332 bytes_per_second=175.745M/s items_per_second=44.0488M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:4/max-string-length:8/batch-size:512/prefixed-probability:90             12596 ns        12588 ns        55529 bytes_per_second=233.718M/s items_per_second=40.6731M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:8/max-string-length:8/batch-size:512/prefixed-probability:99             11033 ns        11028 ns        63642 bytes_per_second=354.205M/s items_per_second=46.4263M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:0/max-string-length:8/batch-size:2048/prefixed-probability:10            51175 ns        51159 ns        13558 bytes_per_second=156.121M/s items_per_second=40.0319M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:4/max-string-length:8/batch-size:2048/prefixed-probability:90            56700 ns        56422 ns        12403 bytes_per_second=207.292M/s items_per_second=36.2977M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:8/max-string-length:8/batch-size:2048/prefixed-probability:99            46428 ns        46191 ns        15185 bytes_per_second=338.272M/s items_per_second=44.338M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:0/max-string-length:64/batch-size:512/prefixed-probability:10            21084 ns        21078 ns        33275 bytes_per_second=728.956M/s items_per_second=24.2912M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:32/max-string-length:64/batch-size:512/prefixed-probability:90           23528 ns        23403 ns        28746 bytes_per_second=1006.64M/s items_per_second=21.8774M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:64/max-string-length:64/batch-size:512/prefixed-probability:99           20969 ns        20863 ns        33702 bytes_per_second=1.46274G/s items_per_second=24.5407M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:0/max-string-length:64/batch-size:2048/prefixed-probability:10          114492 ns       112077 ns         6082 bytes_per_second=567.938M/s items_per_second=18.2731M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:32/max-string-length:64/batch-size:2048/prefixed-probability:90         118275 ns       117342 ns         5873 bytes_per_second=805.154M/s items_per_second=17.4532M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:64/max-string-length:64/batch-size:2048/prefixed-probability:99         116968 ns       115977 ns         6185 bytes_per_second=1077.8M/s items_per_second=17.6587M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:0/max-string-length:1024/batch-size:512/prefixed-probability:10         133510 ns       132742 ns         5261 bytes_per_second=1.90002G/s items_per_second=3.85712M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:512/max-string-length:1024/batch-size:512/prefixed-probability:90       191934 ns       191903 ns         3513 bytes_per_second=1.91668G/s items_per_second=2.66801M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:1024/max-string-length:1024/batch-size:512/prefixed-probability:99      272684 ns       272636 ns         2565 bytes_per_second=1.79096G/s items_per_second=1.87796M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:0/max-string-length:1024/batch-size:2048/prefixed-probability:10        622002 ns       621948 ns         1193 bytes_per_second=1.59998G/s items_per_second=3.29288M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:512/max-string-length:1024/batch-size:2048/prefixed-probability:90      799290 ns       799220 ns          870 bytes_per_second=1.8367G/s items_per_second=2.5625M/s
BM_DeltaEncodingByteArray/min-prefix-string-length:1024/max-string-length:1024/batch-size:2048/prefixed-probability:99    1176743 ns      1176583 ns          628 bytes_per_second=1.66G/s items_per_second=1.74063M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:0/max-string-length:8/batch-size:512/prefixed-probability:10              2387 ns         2371 ns       295677 bytes_per_second=861.51M/s items_per_second=215.929M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:4/max-string-length:8/batch-size:512/prefixed-probability:90              2452 ns         2452 ns       284102 bytes_per_second=1.17173G/s items_per_second=208.805M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:8/max-string-length:8/batch-size:512/prefixed-probability:99              2417 ns         2417 ns       289162 bytes_per_second=1.57817G/s items_per_second=211.818M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:0/max-string-length:8/batch-size:2048/prefixed-probability:10             8744 ns         8743 ns        79466 bytes_per_second=913.521M/s items_per_second=234.241M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:4/max-string-length:8/batch-size:2048/prefixed-probability:90             8962 ns         8958 ns        78023 bytes_per_second=1.27509G/s items_per_second=228.633M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:8/max-string-length:8/batch-size:2048/prefixed-probability:99             8937 ns         8936 ns        78077 bytes_per_second=1.70758G/s items_per_second=229.188M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:0/max-string-length:64/batch-size:512/prefixed-probability:10             2428 ns         2427 ns       287872 bytes_per_second=6.18231G/s items_per_second=210.959M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:32/max-string-length:64/batch-size:512/prefixed-probability:90            2410 ns         2410 ns       288149 bytes_per_second=9.54651G/s items_per_second=212.454M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:64/max-string-length:64/batch-size:512/prefixed-probability:99            2460 ns         2460 ns       284264 bytes_per_second=12.4073G/s items_per_second=208.16M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:0/max-string-length:64/batch-size:2048/prefixed-probability:10            9021 ns         9020 ns        77622 bytes_per_second=6.89111G/s items_per_second=227.039M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:32/max-string-length:64/batch-size:2048/prefixed-probability:90           8871 ns         8870 ns        78905 bytes_per_second=10.4019G/s items_per_second=230.892M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:64/max-string-length:64/batch-size:2048/prefixed-probability:99           9176 ns         9176 ns        76328 bytes_per_second=13.3033G/s items_per_second=223.193M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:0/max-string-length:1024/batch-size:512/prefixed-probability:10           2504 ns         2504 ns       278691 bytes_per_second=100.718G/s items_per_second=204.462M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:512/max-string-length:1024/batch-size:512/prefixed-probability:90         2525 ns         2525 ns       277716 bytes_per_second=145.692G/s items_per_second=202.803M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:1024/max-string-length:1024/batch-size:512/prefixed-probability:99        2558 ns         2556 ns       273213 bytes_per_second=190.996G/s items_per_second=200.274M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:0/max-string-length:1024/batch-size:2048/prefixed-probability:10          9260 ns         9247 ns        75385 bytes_per_second=107.617G/s items_per_second=221.484M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:512/max-string-length:1024/batch-size:2048/prefixed-probability:90        9304 ns         9297 ns        74922 bytes_per_second=157.898G/s items_per_second=220.293M/s
BM_DeltaDecodingByteArray/min-prefix-string-length:1024/max-string-length:1024/batch-size:2048/prefixed-probability:99       9000 ns         8998 ns        77475 bytes_per_second=217.05G/s items_per_second=227.594M/s

@pitrou I've test the Decode/Encoding on MacOS, to avoid too much benchmarks, I configure the min-prefix-length in testing.

  1. When string is nearly pure randomly(prefix-prob = 10%, min-prefix-length = 0), because of my optimization in this patch, it could be a bit slowly than Plain.
  2. When string is partial randomly (prefix-prob = 50%, min-prefix-length = max-length/2), decode/encode will consuming some time
  3. When string is highly repeated (prefix-prob = 99%, min-prefix-length = max-length), decode/encode will become fast again

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 15, 2023
@mapleFU mapleFU force-pushed the parquet-encoding-benchmark-delta branch from c2bde25 to 971e2c9 Compare September 15, 2023 11:58
@mapleFU mapleFU force-pushed the parquet-encoding-benchmark-delta branch from 971e2c9 to c567ed6 Compare September 15, 2023 12:30
@mapleFU
Copy link
Member Author

mapleFU commented Sep 15, 2023

Comparing with Plain Decoding and Delta length, Delta would be slower, because Plain/Delta length would not need to copy any data. Their speed is stable with different input.

BM_PlainDecodingByteArray/max-string-length:8/batch-size:512                                                                 1279 ns         1279 ns       548306 bytes_per_second=2.97226G/s items_per_second=400.298M/s
BM_PlainDecodingByteArray/max-string-length:64/batch-size:512                                                                1279 ns         1278 ns       543322 bytes_per_second=13.5389G/s items_per_second=400.49M/s
BM_PlainDecodingByteArray/max-string-length:1024/batch-size:512                                                              1278 ns         1278 ns       547482 bytes_per_second=198.153G/s items_per_second=400.56M/s
BM_DeltaLengthDecodingByteArray/max-string-length:8/batch-size:512                                                           1566 ns         1566 ns       449207 bytes_per_second=2.42814G/s items_per_second=327.017M/s
BM_DeltaLengthDecodingByteArray/max-string-length:64/batch-size:512                                                          1576 ns         1576 ns       450990 bytes_per_second=10.9846G/s items_per_second=324.93M/s
BM_DeltaLengthDecodingByteArray/max-string-length:1024/batch-size:512                                                        1554 ns         1554 ns       447002 bytes_per_second=162.973G/s items_per_second=329.444M/s

Comparing with Dict, Dict require to set dictionary, and parsing dictionary would get a memcpy. So in benchmark, they're similiar. However, during real using Dict, a ColumnChunk will only set dictionary once, so Dictionary is still faster than DELTA.

Currently DELTA doesn't has speed advantage. However, if data is prefixed-data, like "abcd, abcf, ...", DELTA can balance space and encoding. And when real using parquet, user usally do "compression + encoding", though DELTA decode might be a bit slower, but the decompression part would be so fast because data is short. And I think user can use it if he find the pattern is suitable, user should not use it by default.

cc @rok

@rok
Copy link
Member

rok commented Sep 15, 2023

@mapleFU Agreed! Users should be aware why and when to use which encoding.
One of the use cases for DELTA is storing sorted arrays of file paths.

@mapleFU
Copy link
Member Author

mapleFU commented Sep 15, 2023

Yeah, just decoding/encoding cannot prove that the ability of delta(Though it's still important)

I've test encoding with compression and DELTA gets ok

@mapleFU mapleFU requested review from pitrou and rok September 19, 2023 05:37
@mapleFU mapleFU requested a review from wgtmac September 19, 2023 05:37
@mapleFU
Copy link
Member Author

mapleFU commented Sep 19, 2023

@pitrou I've update the benchmark, would you mind take a look?

// all the prefix lengths are buffered in buffered_prefix_length_.
PARQUET_THROW_NOT_OK(buffered_prefix_length_->Resize(num_prefix * sizeof(int32_t)));
PARQUET_THROW_NOT_OK(buffered_prefix_length_->Resize(num_prefix * sizeof(int32_t),
/*shrink_to_fit=*/false));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this desirable? If decoding first a large page and then a small page, this means that memory wouldn't be released. Does page size always stay similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emmm Personally I don't think the page will be too large, but user might set large page-size, I'll remove that

@mapleFU
Copy link
Member Author

mapleFU commented Sep 26, 2023

      if (buffer[i].len == 0) {
        // If the buffer length is zero, the memcpy to data_ptr can be ignored.
        buffer[i].ptr = reinterpret_cast<const uint8_t*>(prefix.data());
        buffer[i].len = prefix_len_ptr[i];
      } else if (prefix_len_ptr[i] != 0) {
        // If the prefix length is zero, the prefix can be ignored.
        memcpy(data_ptr, prefix.data(), prefix_len_ptr[i]);
        // buffer[i] currently points to the string suffix
        memcpy(data_ptr + prefix_len_ptr[i], buffer[i].ptr, buffer[i].len);
        buffer[i].ptr = data_ptr;
        buffer[i].len += prefix_len_ptr[i];
        data_ptr += buffer[i].len;
      }
      prefix = std::string_view{buffer[i]};

I tent to use code like this, but let me leave it to later patch with more carefully design...

@pitrou
Copy link
Member

pitrou commented Sep 26, 2023

I improved and simplified the benchmark code. Here are the current results here:

BM_DeltaEncodingByteArray/max-string-length:8/batch-size:512/prefixed-percent:10          11142 ns        11140 ns        62716 bytes_per_second=175.236M/s compression_ratio=0.700959 items_per_second=45.9596M/s
BM_DeltaEncodingByteArray/max-string-length:8/batch-size:512/prefixed-percent:90          11904 ns        11902 ns        58803 bytes_per_second=172.829M/s compression_ratio=0.765799 items_per_second=43.0166M/s
BM_DeltaEncodingByteArray/max-string-length:8/batch-size:512/prefixed-percent:99          11796 ns        11794 ns        58970 bytes_per_second=169.732M/s compression_ratio=0.759524 items_per_second=43.4132M/s
BM_DeltaEncodingByteArray/max-string-length:8/batch-size:2048/prefixed-percent:10         40895 ns        40885 ns        16988 bytes_per_second=186.932M/s compression_ratio=0.695357 items_per_second=50.0914M/s
BM_DeltaEncodingByteArray/max-string-length:8/batch-size:2048/prefixed-percent:90         45275 ns        45266 ns        15099 bytes_per_second=172.528M/s compression_ratio=0.753808 items_per_second=45.2437M/s
BM_DeltaEncodingByteArray/max-string-length:8/batch-size:2048/prefixed-percent:99         46609 ns        46598 ns        14949 bytes_per_second=168.19M/s compression_ratio=0.762051 items_per_second=43.9504M/s
BM_DeltaEncodingByteArray/max-string-length:64/batch-size:512/prefixed-percent:10         14330 ns        14326 ns        48859 bytes_per_second=1085.98M/s compression_ratio=0.933408 items_per_second=35.7382M/s
BM_DeltaEncodingByteArray/max-string-length:64/batch-size:512/prefixed-percent:90         14291 ns        14288 ns        49084 bytes_per_second=1104.8M/s compression_ratio=1.19315 items_per_second=35.8346M/s
BM_DeltaEncodingByteArray/max-string-length:64/batch-size:512/prefixed-percent:99         14001 ns        13998 ns        49811 bytes_per_second=1100.33M/s compression_ratio=1.21214 items_per_second=36.5759M/s
BM_DeltaEncodingByteArray/max-string-length:64/batch-size:2048/prefixed-percent:10        46585 ns        46575 ns        14949 bytes_per_second=1.29997G/s compression_ratio=0.930448 items_per_second=43.9721M/s
BM_DeltaEncodingByteArray/max-string-length:64/batch-size:2048/prefixed-percent:90        60950 ns        60938 ns        11346 bytes_per_second=1028.95M/s compression_ratio=1.18361 items_per_second=33.6078M/s
BM_DeltaEncodingByteArray/max-string-length:64/batch-size:2048/prefixed-percent:99        60273 ns        60260 ns        11491 bytes_per_second=1040.17M/s compression_ratio=1.22369 items_per_second=33.9862M/s
BM_DeltaEncodingByteArray/max-string-length:1024/batch-size:512/prefixed-percent:10       40006 ns        39997 ns        17513 bytes_per_second=5.95053G/s compression_ratio=1.0309 items_per_second=12.8011M/s
BM_DeltaEncodingByteArray/max-string-length:1024/batch-size:512/prefixed-percent:90       54380 ns        54368 ns        12895 bytes_per_second=4.32286G/s compression_ratio=1.43071 items_per_second=9.41729M/s
BM_DeltaEncodingByteArray/max-string-length:1024/batch-size:512/prefixed-percent:99       59008 ns        58992 ns        11894 bytes_per_second=4.28957G/s compression_ratio=1.54352 items_per_second=8.67907M/s
BM_DeltaEncodingByteArray/max-string-length:1024/batch-size:2048/prefixed-percent:10     150360 ns       150330 ns         4295 bytes_per_second=6.41765G/s compression_ratio=1.02877 items_per_second=13.6234M/s
BM_DeltaEncodingByteArray/max-string-length:1024/batch-size:2048/prefixed-percent:90     189980 ns       189936 ns         3689 bytes_per_second=5.15671G/s compression_ratio=1.41325 items_per_second=10.7826M/s
BM_DeltaEncodingByteArray/max-string-length:1024/batch-size:2048/prefixed-percent:99     203198 ns       203155 ns         3417 bytes_per_second=4.93974G/s compression_ratio=1.50832 items_per_second=10.081M/s

BM_DeltaDecodingByteArray/max-string-length:8/batch-size:512/prefixed-percent:10           5270 ns         5270 ns       132014 bytes_per_second=370.447M/s compression_ratio=0.700959 items_per_second=97.1578M/s
BM_DeltaDecodingByteArray/max-string-length:8/batch-size:512/prefixed-percent:90           5061 ns         5060 ns       126446 bytes_per_second=406.504M/s compression_ratio=0.765799 items_per_second=101.178M/s
BM_DeltaDecodingByteArray/max-string-length:8/batch-size:512/prefixed-percent:99           5542 ns         5542 ns       126490 bytes_per_second=361.218M/s compression_ratio=0.759524 items_per_second=92.3904M/s
BM_DeltaDecodingByteArray/max-string-length:8/batch-size:2048/prefixed-percent:10         23027 ns        23024 ns        30212 bytes_per_second=331.951M/s compression_ratio=0.695357 items_per_second=88.9516M/s
BM_DeltaDecodingByteArray/max-string-length:8/batch-size:2048/prefixed-percent:90         22239 ns        22235 ns        31329 bytes_per_second=351.231M/s compression_ratio=0.753808 items_per_second=92.1068M/s
BM_DeltaDecodingByteArray/max-string-length:8/batch-size:2048/prefixed-percent:99         22392 ns        22389 ns        31045 bytes_per_second=350.055M/s compression_ratio=0.762051 items_per_second=91.4744M/s
BM_DeltaDecodingByteArray/max-string-length:64/batch-size:512/prefixed-percent:10          5040 ns         5040 ns       137874 bytes_per_second=3.01475G/s compression_ratio=0.933408 items_per_second=101.592M/s
BM_DeltaDecodingByteArray/max-string-length:64/batch-size:512/prefixed-percent:90          5312 ns         5312 ns       133927 bytes_per_second=2.90216G/s compression_ratio=1.19315 items_per_second=96.3918M/s
BM_DeltaDecodingByteArray/max-string-length:64/batch-size:512/prefixed-percent:99          5238 ns         5238 ns       131598 bytes_per_second=2.87179G/s compression_ratio=1.21214 items_per_second=97.7513M/s
BM_DeltaDecodingByteArray/max-string-length:64/batch-size:2048/prefixed-percent:10        20561 ns        20558 ns        33335 bytes_per_second=2.9452G/s compression_ratio=0.930448 items_per_second=99.6227M/s
BM_DeltaDecodingByteArray/max-string-length:64/batch-size:2048/prefixed-percent:90        23385 ns        23381 ns        28702 bytes_per_second=2.61889G/s compression_ratio=1.18361 items_per_second=87.5921M/s
BM_DeltaDecodingByteArray/max-string-length:64/batch-size:2048/prefixed-percent:99        22930 ns        22925 ns        29705 bytes_per_second=2.67001G/s compression_ratio=1.22369 items_per_second=89.333M/s
BM_DeltaDecodingByteArray/max-string-length:1024/batch-size:512/prefixed-percent:10        8253 ns         8251 ns        85126 bytes_per_second=28.8434G/s compression_ratio=1.0309 items_per_second=62.0494M/s
BM_DeltaDecodingByteArray/max-string-length:1024/batch-size:512/prefixed-percent:90        9089 ns         9088 ns        76982 bytes_per_second=25.8625G/s compression_ratio=1.43071 items_per_second=56.341M/s
BM_DeltaDecodingByteArray/max-string-length:1024/batch-size:512/prefixed-percent:99        9418 ns         9416 ns        74093 bytes_per_second=26.874G/s compression_ratio=1.54352 items_per_second=54.374M/s
BM_DeltaDecodingByteArray/max-string-length:1024/batch-size:2048/prefixed-percent:10      35156 ns        35151 ns        19918 bytes_per_second=27.4465G/s compression_ratio=1.02877 items_per_second=58.2635M/s
BM_DeltaDecodingByteArray/max-string-length:1024/batch-size:2048/prefixed-percent:90      38266 ns        38257 ns        18211 bytes_per_second=25.6015G/s compression_ratio=1.41325 items_per_second=53.532M/s
BM_DeltaDecodingByteArray/max-string-length:1024/batch-size:2048/prefixed-percent:99      38955 ns        38948 ns        17954 bytes_per_second=25.7662G/s compression_ratio=1.50832 items_per_second=52.5835M/s

@mapleFU
Copy link
Member Author

mapleFU commented Sep 26, 2023

Nice! So currently the memcpy will causing high CPU usage, let me optimize with #37641 (comment) after this is merged, thanks!

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, let's wait for CI

@mapleFU
Copy link
Member Author

mapleFU commented Sep 26, 2023

Ooops, I believe these failed CI is not caused by me...

@pitrou
Copy link
Member

pitrou commented Sep 26, 2023

Yes, the CI failures are unrelated.

@pitrou pitrou merged commit 5978729 into apache:main Sep 26, 2023
@pitrou pitrou removed the awaiting change review Awaiting change review label Sep 26, 2023
@mapleFU mapleFU deleted the parquet-encoding-benchmark-delta branch September 26, 2023 14:23
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 5978729.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…_ARRAY (apache#37641)

### Rationale for this change

Add benchmark for DELTA_BYTE_ARRAY in parquet, and do tiny optimization.

### What changes are included in this PR?

Add benchmark for DELTA_BYTE_ARRAY in parquet, and do tiny optimization.

### Are these changes tested?

no

### Are there any user-facing changes?

no

* Closes: apache#37293

Lead-authored-by: mwish <maplewish117@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…_ARRAY (apache#37641)

### Rationale for this change

Add benchmark for DELTA_BYTE_ARRAY in parquet, and do tiny optimization.

### What changes are included in this PR?

Add benchmark for DELTA_BYTE_ARRAY in parquet, and do tiny optimization.

### Are these changes tested?

no

### Are there any user-facing changes?

no

* Closes: apache#37293

Lead-authored-by: mwish <maplewish117@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++][Parquet] Encoding: Add Benchmark for DELTA_BYTE_ARRAY

4 participants