Skip to content

Conversation

@paleolimbot
Copy link
Member

This makes the default map_batches() behaviour lazy (i.e., the function is called once per batch as each batch arrives):

library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.

source <- RecordBatchReader$create(
  record_batch(a = 1:10),
  record_batch(a = 11:20)
)

mapped <- map_batches(source, function(x) {
  message("Hi! I'm being evaluated!")
  x
}, .schema = source$schema)

as_arrow_table(mapped)
#> Hi! I'm being evaluated!
#> Hi! I'm being evaluated!
#> Table
#> 20 rows x 1 columns
#> $a <int32>

Created on 2022-10-26 with reprex v2.0.2

This was previously a confusing default since piping the resulting RecordBatchReader into an ExecPlan would fail for some ExecPlans before ARROW-17178 (#13706). This PR commits to the (more optimal/expected) lazy behaviour.

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@paleolimbot paleolimbot merged commit 9707630 into apache:master Oct 29, 2022
@ursabot
Copy link

ursabot commented Oct 30, 2022

Benchmark runs are scheduled for baseline = 286c263 and contender = 9707630. 9707630 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.21% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 97076308 ec2-t3-xlarge-us-east-2
[Failed] 97076308 test-mac-arm
[Finished] 97076308 ursa-i9-9960x
[Finished] 97076308 ursa-thinkcentre-m75q
[Finished] 286c2634 ec2-t3-xlarge-us-east-2
[Failed] 286c2634 test-mac-arm
[Finished] 286c2634 ursa-i9-9960x
[Finished] 286c2634 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@paleolimbot paleolimbot deleted the r-map-batches-make-lazy branch October 31, 2022 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants