Improve performance of extracting statistics from parquet files

### Is your feature request related to a problem or challenge?

Part of https://github.com/apache/datafusion/issues/10453

@Lordworms added a benchmark for extracting statistics from parquet files in https://github.com/apache/datafusion/pull/10610

As this code can be used to extract statistics from parquet files, we would like to make sure it is efficient (especially if we are going to extract statistics for many files at once)

The idea here is to improve the speed of the statistics extraction



### Describe the solution you'd like

Make this go faster

```shell
cargo bench --bench parquet_statistic
```



### Describe alternatives you've considered

 I did some brief profiling:

![Screenshot 2024-05-22 at 3 37 30 PM](https://github.com/apache/datafusion/assets/490673/c53c5a1d-2d06-4d13-bd87-e5d6e51ccb49)

I think they key would be to change these loops so they built the required Arrow Arrays directly from primitive values rather than from `ScalarValue`:

https://github.com/apache/datafusion/blob/1bf7112171fd820c101e325822dc4d44dd65b2ff/datafusion/core/src/datasource/physical_plan/parquet/statistics.rs#L183-L189


### Additional context

_No response_

	pub(crate) fn min_statistics<'a, I: Iterator<Item = Option<&'a ParquetStatistics>>>(
	data_type: &DataType,
	iterator: I,
	) -> Result<ArrayRef> {
	let scalars = iterator
	.map(\|x\| x.and_then(\|s\| get_statistic!(s, min, min_bytes, Some(data_type))));
	collect_scalars(data_type, scalars)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve performance of extracting statistics from parquet files #10626

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve performance of extracting statistics from parquet files #10626

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions