ARROW-18012: [R] Make map_batches .lazy = TRUE by default #14521

paleolimbot · 2022-10-26T15:06:49Z

This makes the default map_batches() behaviour lazy (i.e., the function is called once per batch as each batch arrives):

library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.

source <- RecordBatchReader$create(
  record_batch(a = 1:10),
  record_batch(a = 11:20)
)

mapped <- map_batches(source, function(x) {
  message("Hi! I'm being evaluated!")
  x
}, .schema = source$schema)

as_arrow_table(mapped)
#> Hi! I'm being evaluated!
#> Hi! I'm being evaluated!
#> Table
#> 20 rows x 1 columns
#> $a <int32>

^{Created on 2022-10-26 with reprex v2.0.2}

This was previously a confusing default since piping the resulting RecordBatchReader into an ExecPlan would fail for some ExecPlans before ARROW-17178 (#13706). This PR commits to the (more optimal/expected) lazy behaviour.

github-actions · 2022-10-26T17:52:07Z

https://issues.apache.org/jira/browse/ARROW-18012

github-actions · 2022-10-26T17:52:09Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

ursabot · 2022-10-30T16:21:41Z

Benchmark runs are scheduled for baseline = 286c263 and contender = 9707630. 9707630 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.21% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 97076308 ec2-t3-xlarge-us-east-2
[Failed] 97076308 test-mac-arm
[Finished] 97076308 ursa-i9-9960x
[Finished] 97076308 ursa-thinkcentre-m75q
[Finished] 286c2634 ec2-t3-xlarge-us-east-2
[Failed] 286c2634 test-mac-arm
[Finished] 286c2634 ursa-i9-9960x
[Finished] 286c2634 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

make map_batches lazy by default

05afdfd

nealrichardson approved these changes Oct 26, 2022

View reviewed changes

github-actions bot added the Component: R label Oct 26, 2022

paleolimbot merged commit 9707630 into apache:master Oct 29, 2022

paleolimbot deleted the r-map-batches-make-lazy branch October 31, 2022 13:07

asfimport mentioned this pull request Oct 30, 2022

[R] Make map_batches .lazy = TRUE by default #20453

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-18012: [R] Make map_batches .lazy = TRUE by default #14521

ARROW-18012: [R] Make map_batches .lazy = TRUE by default #14521

Uh oh!

paleolimbot commented Oct 26, 2022

Uh oh!

github-actions bot commented Oct 26, 2022

Uh oh!

github-actions bot commented Oct 26, 2022

Uh oh!

ursabot commented Oct 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ARROW-18012: [R] Make map_batches .lazy = TRUE by default #14521

ARROW-18012: [R] Make map_batches .lazy = TRUE by default #14521

Uh oh!

Conversation

paleolimbot commented Oct 26, 2022

Uh oh!

github-actions bot commented Oct 26, 2022

Uh oh!

github-actions bot commented Oct 26, 2022

Uh oh!

ursabot commented Oct 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants