Skip to content

[GLUTEN-8453] [VL] Allow Heavy Batch to be Processed by ColumnarCachedBatchSerializer#8454

Merged
zhztheplayer merged 2 commits intoapache:mainfrom
ArnavBalyan:arnavb/fix-column-serializer
Jan 8, 2025
Merged

[GLUTEN-8453] [VL] Allow Heavy Batch to be Processed by ColumnarCachedBatchSerializer#8454
zhztheplayer merged 2 commits intoapache:mainfrom
ArnavBalyan:arnavb/fix-column-serializer

Conversation

@ArnavBalyan
Copy link
Copy Markdown
Member

@ArnavBalyan ArnavBalyan commented Jan 7, 2025

  • Currently the ColumnarCachedBatchSerializer does not support Arrow Heavy Batch.
  • ColumnarCachedBatchSerializer expects light batch to offload to native. (In most cases it receives an already offloaded, however fails when the input is a heavy batch).
  • Added conversion to offload it if the upstream operator produced an ArrowJavaBatch.
  • Also makes the check light/heavy batch public, since they can be good utility functions and don't have critical logic inside.
  • Note: This is a fix which will make it work, but ideally it should work with RAS and be compatible with the transitions added, to do this we can wrap the InMemoryTableScanExec and register as a Gluten operator to elegantly offload. I'll investigate as part 2

@github-actions github-actions bot added the VELOX label Jan 7, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 7, 2025

#8453

@FelixYBW FelixYBW requested a review from zhztheplayer January 7, 2025 21:07

import org.apache.gluten.backendsapi.BackendsApiManager
import org.apache.gluten.columnarbatch.ColumnarBatches
import org.apache.gluten.columnarbatch.{ColumnarBatches, VeloxColumnarBatches}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhztheplayer Ok to add VeloxColumnarBatches here?

Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's normal to call the utility from Velox backend code. However it seems like discussible on whether to rely on isLightBatch / isHeavyBatch to add conditional transitions.

@ArnavBalyan Would you like to help check if we can somehow add explicit transition nodes (LoadArrowData / OffloadArrowData) into query plan instead of the PR's change? Or is the last Note. in pr description meant for something similar? Thanks!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also refer to a previous effort #7313 if needed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @FelixYBW @zhztheplayer!
Yes, the note was meant for that. Ideally the transitions should have added the correct transition node before this, However the serializer is a special case since it's not an operator and does not extend the GlutenPlan, I have some ideas to explore this which may require some design changes in the serializer to make it work with transitions.

Would it be possible to merge this for now since the ColumnarRange operator depends on it and I'll work on the serializer compatibility for transitions, let me know what you think thanks!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However the serializer is a special case since it's not an operator and does not extend the GlutenPlan

Agreed. The code path is different. Thanks for figuring out on this.

Do you think we can add a UT for the change in this PR? If this can be considered an individual fix?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure let me add it in the ColumnarRangeExec, since it already has the failing UT thanks!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please feel free to move forward to the Range PR. I am also testing the relevant code and will help add a test case here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants