Skip to content

[Rust][Arrow] Improve performance of filter kernel #26152

@asfimport

Description

@asfimport

As @jorgecarleitao noted here:
#8303 (comment)

The improvement of the filter kernel (and likely others) could be improved by avoiding creating intermediate copies. The code currently:

  1. creates Vec<Option> through an iteration

  2. copies Vec<Option> to the two buffers (when from_opt_vec is called)

    it may be more efficient to create the buffers during the iteration, so that we avoid the copy (Vec -> buffers). In other words, the code in from_opt_vec could have been "injected" into the filter execution, where the MutableBuffer and offsets and values buffer are created before the loop, and new elements are directly written to it.

    (as a side note, this is why he proposed ARROW-10030 ARROW-10030: [Rust] Add support for FromIter and IntoIter for primitive types #8211 : IMO there is some boiler-plate copy-pasting to

  • initialize buffers

  • iterate

  • create ArrayData from buffers

    which will continue to grow as we add more kernels, and whose pattern seems to be a FromIter of fixed size

Reporter: Andrew Lamb / @alamb

Note: This issue was originally created as ARROW-10141. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions