Skip to content

chore: Investigate impact of small batches on performance #495

@andygrove

Description

@andygrove

What is the problem the feature request solves?

I added some debug logging to CometNativeIterator to show the size of batches being processed when running TPC-H q14 and I see lots of small batches being processed.

Creating batch with 97 rows
Creating batch with 86 rows
Creating batch with 87 rows
Creating batch with 72 rows
Creating batch with 80 rows
...

The query processes 73,456 batches with fewer than 1000 rows and 2,448 batches with at least 1000 rows.

I wonder if there would be a performance benefit in coalescing these small batches into larger batches (if that is even possible -- I do not have full information on the context yet).

My theory is that we have some overhead per batch and that we could reduce that overhead if we had larger batches. This issue is for analyzing this and writing up some findings.

Describe the potential solution

No response

Additional context

No response

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions