[C++][Parquet] Parquet write_to_dataset performance regression

### Describe the bug, including details regarding any error messages, version, and platform.

On Linux, `pyarrow.parquet.write_to_dataset` shows a large performance regression in Arrow 12.0 versus 11.0.

The following results were collected using Ubuntu 22.04.2 LTS (5.15.0-71-generic), Intel Haswell 4-core @ 3.6GHz, 16 GB RAM, Samsung 840 Pro SSD. They are elapsed times in seconds to write a single int64 column of integers [0,..., _length_-1] with no compression and no multi-threading:

| Array length | Arrow 11 (s) | Arrow 12 (s) |
|-----------------:|--------:|--------:|
| 1,000,000 | 0.1 | 0.1|
| 2,000,000 | 0.2 | 0.4 |
| 4,000,000 | 0.3 | 1.6 |
| 8,000,000 | 0.8 | 6.2 |
| 16,000,000 | 2.3 | 24.4 |
| 32,000,000 | 6.5 | 94.1 |
| 64,000,000 | 13.5 | 371.7 |

The output directory was deleted before each run.
```
"""check.py"""
import sys
import time
import numpy as np
import pyarrow as pa
import pyarrow.parquet as pq

def main():
    path = '/tmp/test.parquet'
    length = 10_000_000 if len(sys.argv) < 2 else int(sys.argv[1])
    table = pa.Table.from_arrays([pa.array(np.arange(length))], names=['A'])
    t0 = time.perf_counter()
    pq.write_to_dataset(
        table, path, schema=table.schema, use_legacy_dataset=False, use_threads=False, compression=None
    )
    duration = time.perf_counter() - t0
    print(f'{duration:.2f}s')

if __name__ == '__main__':
    main()
```

Running `git bisect` on local builds leads me to this commit: 660d259f525d301f7ff5b90416622698fa8a5e9c: [C++] Add ordered/segmented aggregation Substrait extension (#34627).

Following that change, Flamegraphs show a lot of additional time spent in `arrow::util::EnsureAlignment` calling glibc `memcpy`:

Before ~1.3s (ddd0a337174e57cdc80b1ee30dc7e787acfc09f6)
![good-ddd0a33 perf](https://user-images.githubusercontent.com/13152260/236944113-e7b6abb3-9449-4ca6-8a4c-ab88c0f9ace9.svg)

After ~9.6s (660d259f525d301f7ff5b90416622698fa8a5e9c)
![bad-660d259 perf](https://user-images.githubusercontent.com/13152260/236944165-fbad2d51-716d-4985-ac1e-a51b18bf76a8.svg)

Reading and `pyarrow.parquet.write_table` appear unaffected.

### Component(s)

C++, Parquet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[C++][Parquet] Parquet write_to_dataset performance regression #35498

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Array length	Arrow 11 (s)	Arrow 12 (s)
1,000,000	0.1	0.1
2,000,000	0.2	0.4
4,000,000	0.3	1.6
8,000,000	0.8	6.2
16,000,000	2.3	24.4
32,000,000	6.5	94.1
64,000,000	13.5	371.7

[C++][Parquet] Parquet write_to_dataset performance regression #35498

Description

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions