Skip to content

[R] right_join() function does not produce the expected outcome #14907

@abduazizR

Description

@abduazizR

Describe the bug, including details regarding any error messages, version, and platform.

Hi,

I noticed something strange today when I was using arrow datasets. I cannot give a reproducible example but you can get the idea from the code below. I have ccaei as an arrow dataset. When I try to use right_join() with an R tibble before using collect(), it gives me wrong numbers (the number of distinct ENROLID is less than that present in outpatients). I get the correct number when I use right_join() after using collect(), although this is computationally inefficient. Could you help me with this?

This gives a really weird number

ccaei |>  
  filter(ADMDATE >= as_date("2016-10-01")) |> 
  filter(!is.na(ENROLID)) |> 
  select(ENROLID, ADMDATE) |> 
  right_join(outpatients) |> 
  collect() |> count(ENROLID)

This makes sense

ccaei |>  
  filter(ADMDATE >= as_date("2016-10-01")) |> 
  filter(!is.na(ENROLID)) |> 
  select(ENROLID, ADMDATE) |> 
  collect() |> 
    right_join(outpatients) |> 
  count(ENROLID)

Not sure where the mistake came from.

Component(s)

R

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions