-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Here's the problem I detected while triaging tickets.
This was run locally after merging from apache/arrow at commit 8773b9d and re-building both Arrow library and Arrow R package.
library(arrow)
#> See arrow_info() for available features
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(testthat)
#>
#> Attaching package: 'testthat'
#> The following object is masked from 'package:dplyr':
#>
#> matches
#> The following object is masked from 'package:arrow':
#>
#> matches
tstring <- tibble(x = c("08-05-2008", NA))
tstamp <- tibble(x = c(strptime("08-05-2008", format = "%m-%d-%Y"), NA))
expect_equal(
tstring %>%
Table$create() %>%
mutate(
x = strptime(x, format = "%m-%d-%Y")
) %>%
collect(),
tstamp,
check.tzone = FALSE
)
#> Error: `%>%`(...) not equal to `tstamp`.
#> Component "x": Mean absolute difference: 14400We can see that the dates are different by exact 4 hours by removing the expectation:
library(arrow)
#> See arrow_info() for available features
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(testthat)
#>
#> Attaching package: 'testthat'
#> The following object is masked from 'package:dplyr':
#>
#> matches
#> The following object is masked from 'package:arrow':
#>
#> matches
tstring <- tibble(x = c("08-05-2008", NA))
tstamp <- tibble(x = c(strptime("08-05-2008", format = "%m-%d-%Y"), NA))
tstring %>%
Table$create() %>%
mutate(
x = strptime(x, format = "%m-%d-%Y")
) %>%
collect()
#> # A tibble: 2 x 1
#> x
#> <dttm>
#> 1 2008-08-04 20:00:00
#> 2 NA
tstamp
#> # A tibble: 2 x 1
#> x
#> <dttm>
#> 1 2008-08-05 00:00:00
#> 2 NACreated on 2021-06-07 by the reprex package (v2.0.0)
Reporter: Mauricio 'Pachá' Vargas Sepúlveda / @pachadotdev
Assignee: Neal Richardson / @nealrichardson
Watchers: Rok Mihevc / @rok
PRs and other links:
Note: This issue was originally created as ARROW-12994. Please see the migration documentation for further details.