Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions r/NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
## Minor improvements and fixes

- Added bindings for atan, sinh, cosh, tanh, asinh, acosh, and tanh, and expm1 (#44953)
- Expose an option `check_directory_existence_before_creation` in `S3FileSystem`
to reduce I/O calls on cloud storage (@HaochengLIU, #41998)

# arrow 20.0.0

Expand Down
4 changes: 2 additions & 2 deletions r/R/arrowExports.R

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 4 additions & 2 deletions r/R/dplyr-funcs-datetime.R
Original file line number Diff line number Diff line change
Expand Up @@ -856,7 +856,8 @@ register_bindings_hms <- function() {
Expression$create("multiply_checked", days, 86400)

return(numeric_to_time32(total_secs))
}
},
notes = "subsecond times not supported"
)

register_binding(
Expand All @@ -880,6 +881,7 @@ register_bindings_hms <- function() {
)
return(datetime_to_time32(as_date_time))
}
}
},
notes = "subsecond times not supported"
)
}
15 changes: 7 additions & 8 deletions r/R/dplyr-funcs-doc.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
#'
#' The `arrow` package contains methods for 37 `dplyr` table functions, many of
#' which are "verbs" that do transformations to one or more tables.
#' The package also has mappings of 221 R functions to the corresponding
#' The package also has mappings of 222 R functions to the corresponding
#' functions in the Arrow compute library. These allow you to write code inside
#' of `dplyr` methods that call R functions, including many in packages like
#' `stringr` and `lubridate`, and they will get translated to Arrow and run
Expand Down Expand Up @@ -83,7 +83,7 @@
#' Functions can be called either as `pkg::fun()` or just `fun()`, i.e. both
#' `str_sub()` and `stringr::str_sub()` work.
#'
#' In addition to these functions, you can call any of Arrow's 271 compute
#' In addition to these functions, you can call any of Arrow's 280 compute
#' functions directly. Arrow has many functions that don't map to an existing R
#' function. In other cases where there is an R function mapping, you can still
#' call the Arrow function directly if you don't want the adaptations that the R
Expand All @@ -96,7 +96,6 @@
#'
#' * [`add_filename()`][arrow::add_filename()]
#' * [`cast()`][arrow::cast()]
#' * [`one()`][arrow::one()]
#'
#' ## base
#'
Expand Down Expand Up @@ -215,6 +214,11 @@
#' * [`n()`][dplyr::n()]
#' * [`n_distinct()`][dplyr::n_distinct()]
#'
#' ## hms
#'
#' * [`as_hms()`][hms::as_hms()]
#' * [`hms()`][hms::hms()]
#'
#' ## lubridate
#'
#' * [`am()`][lubridate::am()]
Expand Down Expand Up @@ -297,11 +301,6 @@
#' * [`ymd_hms()`][lubridate::ymd_hms()]: `locale` argument not supported
#' * [`yq()`][lubridate::yq()]: `locale` argument not supported
#'
#' ## hms
#'
#' * [`hms()`][hms::hms()]: subsecond times not supported
#' * [`hms()`][hms::as_hms()]: subsecond times not supported
#'
#' ## methods
#'
#' * [`is()`][methods::is()]
Expand Down
8 changes: 7 additions & 1 deletion r/R/filesystem.R
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,10 @@ FileSelector$create <- function(base_dir, allow_not_found = FALSE, recursive = F
#' buckets if `$CreateDir()` is called on the bucket level (default `FALSE`).
#' - `allow_bucket_deletion`: logical, if TRUE, the filesystem will delete
#' buckets if`$DeleteDir()` is called on the bucket level (default `FALSE`).
#' - `check_directory_existence_before_creation`: logical, check if directory
#' already exists or not before creation. Helpful for cloud storage operations
#' where object mutation operations are rate limited or existing directories
#' are read-only. (default `FALSE`).
#' - `request_timeout`: Socket read time on Windows and macOS in seconds. If
#' negative, the AWS SDK default (typically 3 seconds).
#' - `connect_timeout`: Socket connection timeout in seconds. If negative, AWS
Expand Down Expand Up @@ -411,7 +415,8 @@ S3FileSystem$create <- function(anonymous = FALSE, ...) {
invalid_args <- intersect(
c(
"access_key", "secret_key", "session_token", "role_arn", "session_name",
"external_id", "load_frequency", "allow_bucket_creation", "allow_bucket_deletion"
"external_id", "load_frequency", "allow_bucket_creation", "allow_bucket_deletion",
"check_directory_existence_before_creation"
),
names(args)
)
Expand Down Expand Up @@ -459,6 +464,7 @@ default_s3_options <- list(
background_writes = TRUE,
allow_bucket_creation = FALSE,
allow_bucket_deletion = FALSE,
check_directory_existence_before_creation = FALSE,
connect_timeout = -1,
request_timeout = -1
)
Expand Down
4 changes: 4 additions & 0 deletions r/data-raw/docgen.R
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,10 @@ tidyselect <- grep("^tidyselect::", readLines("R/reexports-tidyselect.R"), value
# HACK: remove the _random_along UDF we're using (fix in ARROW-17974)
docs[["_random_along"]] <- NULL

# TODO - update the script to add this back in - will fail CI as tries to link
# to non-existent function as arrow::one only exists as registered binding
docs[["arrow::one"]] <- NULL

Comment on lines +152 to +155
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do this now? Or is this a thing for later? If later, do we have an issue for it? I'm not sure I fully get what the comment is saying (but will admit I haven't dug too much right now)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So without it we were getting the warning shown here: https://github.com/apache/arrow/actions/runs/15282844766/job/42985877041#step:5:612

I don't think we need to do it now, I will open a ticket.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs <- c(docs, setNames(rep(list(NULL), length(tidyselect)), tidyselect))

fun_df <- tibble::tibble(
Expand Down
4 changes: 4 additions & 0 deletions r/man/FileSystem.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

28 changes: 18 additions & 10 deletions r/man/acero.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 6 additions & 5 deletions r/src/arrowExports.cpp

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 5 additions & 1 deletion r/src/filesystem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,8 @@ std::shared_ptr<fs::S3FileSystem> fs___S3FileSystem__create(
std::string region = "", std::string endpoint_override = "", std::string scheme = "",
std::string proxy_options = "", bool background_writes = true,
bool allow_bucket_creation = false, bool allow_bucket_deletion = false,
double connect_timeout = -1, double request_timeout = -1) {
bool check_directory_existence_before_creation = false, double connect_timeout = -1,
double request_timeout = -1) {
// We need to ensure that S3 is initialized before we start messing with the
// options
StopIfNotOk(fs::EnsureS3Initialized());
Expand Down Expand Up @@ -331,6 +332,9 @@ std::shared_ptr<fs::S3FileSystem> fs___S3FileSystem__create(
s3_opts.allow_bucket_creation = allow_bucket_creation;
s3_opts.allow_bucket_deletion = allow_bucket_deletion;

s3_opts.check_directory_existence_before_creation =
check_directory_existence_before_creation;

s3_opts.request_timeout = request_timeout;
s3_opts.connect_timeout = connect_timeout;

Expand Down
6 changes: 4 additions & 2 deletions r/tests/testthat/test-s3-minio.R
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,17 @@ fs <- S3FileSystem$create(
scheme = "http",
endpoint_override = paste0("localhost:", minio_port),
allow_bucket_creation = TRUE,
allow_bucket_deletion = TRUE
allow_bucket_deletion = TRUE,
check_directory_existence_before_creation = TRUE
)
limited_fs <- S3FileSystem$create(
access_key = minio_key,
secret_key = minio_secret,
scheme = "http",
endpoint_override = paste0("localhost:", minio_port),
allow_bucket_creation = FALSE,
allow_bucket_deletion = FALSE
allow_bucket_deletion = FALSE,
check_directory_existence_before_creation = FALSE
)
now <- as.character(as.numeric(Sys.time()))
fs$CreateDir(now)
Expand Down
2 changes: 1 addition & 1 deletion r/vignettes/fs.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ Also note that parameters in the URI need to be

For S3, only the following options can be included in the URI as query parameters
are `region`, `scheme`, `endpoint_override`, `access_key`, `secret_key`, `allow_bucket_creation`,
and `allow_bucket_deletion`. For GCS, the supported parameters are `scheme`, `endpoint_override`,
`allow_bucket_deletion` and `check_directory_existence_before_creation`. For GCS, the supported parameters are `scheme`, `endpoint_override`,
and `retry_limit_seconds`.

In GCS, a useful option is `retry_limit_seconds`, which sets the number of seconds
Expand Down
Loading