Skip to content

arrow: support morsel driven parallelism for arrow tables#238

Merged
adsharma merged 3 commits intomasterfrom
arrow_morsel
Feb 25, 2026
Merged

arrow: support morsel driven parallelism for arrow tables#238
adsharma merged 3 commits intomasterfrom
arrow_morsel

Conversation

@adsharma
Copy link
Copy Markdown
Contributor

Related to: #183

Useful for multi-core machines.

aheev and others added 2 commits February 25, 2026 11:22
- Use ColumnarNodeTableSharedState::getNextBatch() for thread-safe batch assignment
- Add initArrowScanForBatch() helper method to fetch batches from shared state
- Modify initScanState() to initialize from shared state
- Update scanInternal() to fetch next batch when current is exhausted

Each parallel scan thread now gets assigned full RecordBatches via the shared state,
enabling parallel processing across multiple batches.
- Add morsel tracking fields to ArrowNodeTableScanState (morselSize, start/end offsets)
- Modify initArrowScanForBatch() to initialize morsel boundaries when assigning batches
- Update scanInternal() to process only morsel-sized chunks (default: 2048 rows)
- When morsels in a batch are exhausted, fetch next batch from shared state

This enables parallel processing of large RecordBatches by dividing them into smaller
morsels that can be processed by multiple threads concurrently.
@adsharma adsharma merged commit bf5a292 into master Feb 25, 2026
26 checks passed
@adsharma adsharma deleted the arrow_morsel branch February 25, 2026 22:44
@aheev
Copy link
Copy Markdown
Contributor

aheev commented Feb 26, 2026

@adsharma

I would like propose some changes

  • scanState.currentBatchOffset can be replaced by scanState.currentMorselStartOffset
  • why not leave the batch level orchestration to scan_node_table? The inner while loop is getNextTuplesInternal is run for each batch. But with this change, it runs for full table instead and nextMorsel in scan_node_table is never used. It leads to inconsistent behaviours for different tables

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants