Skip to content

Conversation

@Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Jan 2, 2026

Which issue does this PR close?

  • Closes #NNN.

Rationale for this change

Previous benchmark is too fast to deterministically measure the performance improvement because they run only in 2-7 microsecond.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 2, 2026
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After these changes on an M4 mac

arrow-rs (pr_9088)$ cargo bench -p arrow-json
    Finished `bench` profile [optimized] target(s) in 0.05s
     Running benches/serde.rs (/Users/jeffrey/.cargo_target_cache/release/deps/serde-a1ab5d1498b8bdfe)
small_i32               time:   [323.29 µs 325.91 µs 328.65 µs]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild

Benchmarking large_i32: Collecting 100 samples in estimated 6.5412 s (20k iterations)^C⏎                                                                                                                              arrow-rs (pr_9088)$ cargo bench -p arrow-json --bench serde
    Finished `bench` profile [optimized] target(s) in 0.06s
     Running benches/serde.rs (/Users/jeffrey/.cargo_target_cache/release/deps/serde-a1ab5d1498b8bdfe)
small_i32               time:   [314.67 µs 315.97 µs 317.47 µs]
                        change: [−2.6760% −1.8315% −0.9768%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

large_i32               time:   [315.96 µs 317.31 µs 318.82 µs]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

small_i64               time:   [483.97 µs 485.09 µs 486.20 µs]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

medium_i64              time:   [485.07 µs 487.39 µs 491.23 µs]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

large_i64               time:   [486.93 µs 491.01 µs 498.46 µs]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

small_f32               time:   [575.10 µs 578.78 µs 585.47 µs]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

large_f32               time:   [572.65 µs 573.71 µs 574.81 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

c.bench_function(name, |b| {
b.iter(|| {
let builder = ReaderBuilder::new(schema.clone()).with_batch_size(64);
let builder = ReaderBuilder::new(schema.clone()).with_batch_size(batch_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a batch size of 256k (2**18) is also too big -- can we use 4K or 8KB instead? I think that would be more realistic?

Parsing 256K rows does make sense to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants