-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Describe the bug, including details regarding any error messages, version, and platform.
I just ran this with dev arrow and it took 45 seconds on my first run and 85 seconds on my second run:
library(arrow)
library(dplyr)
library(tictoc)
nyc_taxi <- open_dataset("data/nyc-taxi/")
tic()
nyc_taxi |>
group_by(year) |>
summarise(
all_trips = n(),
shared_trips = sum(passenger_count > 1, na.rm= TRUE)
) |>
mutate(pct_shared = shared_trips / all_trips * 1) |>
collect()
toc()
If I do it with 16.1.0 it only took 5 seconds on both runs.
Component(s)
R