-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Describe the bug, including details regarding any error messages, version, and platform.
Hi,
I noticed something strange today when I was using arrow datasets. I cannot give a reproducible example but you can get the idea from the code below. I have ccaei as an arrow dataset. When I try to use right_join() with an R tibble before using collect(), it gives me wrong numbers (the number of distinct ENROLID is less than that present in outpatients). I get the correct number when I use right_join() after using collect(), although this is computationally inefficient. Could you help me with this?
This gives a really weird number
ccaei |>
filter(ADMDATE >= as_date("2016-10-01")) |>
filter(!is.na(ENROLID)) |>
select(ENROLID, ADMDATE) |>
right_join(outpatients) |>
collect() |> count(ENROLID)
This makes sense
ccaei |>
filter(ADMDATE >= as_date("2016-10-01")) |>
filter(!is.na(ENROLID)) |>
select(ENROLID, ADMDATE) |>
collect() |>
right_join(outpatients) |>
count(ENROLID)
Not sure where the mistake came from.
Component(s)
R