diff --git a/r/vignettes/fs.Rmd b/r/vignettes/fs.Rmd index 5d699c49df0..6990469af21 100644 --- a/r/vignettes/fs.Rmd +++ b/r/vignettes/fs.Rmd @@ -32,7 +32,7 @@ For example, one of the NYC taxi data files used in `vignette("dataset", package s3://ursa-labs-taxi-data/2019/06/data.parquet ``` -Given this URI, we can pass it to `read_parquet()` just as if it were a local file path: +Given this URI, you can pass it to `read_parquet()` just as if it were a local file path: ```r df <- read_parquet("s3://ursa-labs-taxi-data/2019/06/data.parquet") @@ -54,7 +54,7 @@ This may be convenient when dealing with long URIs, and it's necessary for some options and authentication methods that aren't supported in the URI format. -With a `FileSystem` object, we can point to specific files in it with the `$path()` method. +With a `FileSystem` object, you can point to specific files in it with the `$path()` method. In the previous example, this would look like: ```r @@ -62,13 +62,20 @@ bucket <- s3_bucket("ursa-labs-taxi-data") df <- read_parquet(bucket$path("2019/06/data.parquet")) ``` -See the help for `FileSystem` for a list of options that `s3_bucket()` and `S3FileSystem$create()` +You can list the files and/or directories in an S3 bucket or subdirectory using +the `$ls()` method: + +```r +bucket$ls() +``` + +See `help(FileSystem)` for a list of options that `s3_bucket()` and `S3FileSystem$create()` can take. `region`, `scheme`, and `endpoint_override` can be encoded as query parameters in the URI (though `region` will be auto-detected in `s3_bucket()` or from the URI if omitted). `access_key` and `secret_key` can also be included, but other options are not supported in the URI. -The object that `s3_bucket()` returns is technically a `SubTreeFileSystem`, which holds a path and a file system to which it corresponds. `SubTreeFileSystem`s can be useful for holding a reference to a subdirectory somewhere, on S3 or elsewhere. +The object that `s3_bucket()` returns is technically a `SubTreeFileSystem`, which holds a path and a file system to which it corresponds. `SubTreeFileSystem`s can be useful for holding a reference to a subdirectory somewhere (on S3 or elsewhere). One way to get a subtree is to call the `$cd()` method on a `FileSystem` @@ -86,21 +93,30 @@ june2019 <- SubTreeFileSystem$create("s3://ursa-labs-taxi-data/2019/06") ## Authentication To access private S3 buckets, you need typically need two secret parameters: -a `access_key`, which is like a user id, -and `secret_key`, like a token. -There are a few options for passing these credentials: +a `access_key`, which is like a user id, and `secret_key`, which is like a token +or password. There are a few options for passing these credentials: -1. Include them in the URI, like `s3://access_key:secret_key@bucket-name/path/to/file`. Be sure to [URL-encode](https://en.wikipedia.org/wiki/Percent-encoding) your secrets if they contain special characters like "/". +- Include them in the URI, like `s3://access_key:secret_key@bucket-name/path/to/file`. Be sure to [URL-encode](https://en.wikipedia.org/wiki/Percent-encoding) your secrets if they contain special characters like "/" (e.g., `URLencode("123/456", reserved = TRUE)`). -2. Pass them as `access_key` and `secret_key` to `S3FileSystem$create()` or `s3_bucket()` +- Pass them as `access_key` and `secret_key` to `S3FileSystem$create()` or `s3_bucket()` -3. Set them as environment variables named `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`, respectively. +- Set them as environment variables named `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`, respectively. -4. Define them in a `~/.aws/credentials` file, according to the [AWS documentation](https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/credentials.html). +- Define them in a `~/.aws/credentials` file, according to the [AWS documentation](https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/credentials.html). -You can also use an [AccessRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html) +- Use an [AccessRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html) for temporary access by passing the `role_arn` identifier to `S3FileSystem$create()` or `s3_bucket()`. +## Using a proxy server + +If you need to use a proxy server to connect to an S3 bucket, you can provide +a URI in the form `http://user:password@host:port` to `proxy_options`. For +example, a local proxy server running on port 1316 can be used like this: + +```r +bucket <- s3_bucket("ursa-labs-taxi-data", proxy_options = "http://localhost:1316") +``` + ## File systems that emulate S3 The `S3FileSystem` machinery enables you to work with any file system that