Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 28 additions & 12 deletions r/vignettes/fs.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ For example, one of the NYC taxi data files used in `vignette("dataset", package
s3://ursa-labs-taxi-data/2019/06/data.parquet
```

Given this URI, we can pass it to `read_parquet()` just as if it were a local file path:
Given this URI, you can pass it to `read_parquet()` just as if it were a local file path:

```r
df <- read_parquet("s3://ursa-labs-taxi-data/2019/06/data.parquet")
Expand All @@ -54,21 +54,28 @@ This may be convenient when dealing with
long URIs, and it's necessary for some options and authentication methods
that aren't supported in the URI format.

With a `FileSystem` object, we can point to specific files in it with the `$path()` method.
With a `FileSystem` object, you can point to specific files in it with the `$path()` method.
In the previous example, this would look like:

```r
bucket <- s3_bucket("ursa-labs-taxi-data")
df <- read_parquet(bucket$path("2019/06/data.parquet"))
```

See the help for `FileSystem` for a list of options that `s3_bucket()` and `S3FileSystem$create()`
You can list the files and/or directories in an S3 bucket or subdirectory using
the `$ls()` method:

```r
bucket$ls()
```

See `help(FileSystem)` for a list of options that `s3_bucket()` and `S3FileSystem$create()`
can take. `region`, `scheme`, and `endpoint_override` can be encoded as query
parameters in the URI (though `region` will be auto-detected in `s3_bucket()` or from the URI if omitted).
`access_key` and `secret_key` can also be included,
but other options are not supported in the URI.

The object that `s3_bucket()` returns is technically a `SubTreeFileSystem`, which holds a path and a file system to which it corresponds. `SubTreeFileSystem`s can be useful for holding a reference to a subdirectory somewhere, on S3 or elsewhere.
The object that `s3_bucket()` returns is technically a `SubTreeFileSystem`, which holds a path and a file system to which it corresponds. `SubTreeFileSystem`s can be useful for holding a reference to a subdirectory somewhere (on S3 or elsewhere).

One way to get a subtree is to call the `$cd()` method on a `FileSystem`

Expand All @@ -86,21 +93,30 @@ june2019 <- SubTreeFileSystem$create("s3://ursa-labs-taxi-data/2019/06")
## Authentication

To access private S3 buckets, you need typically need two secret parameters:
a `access_key`, which is like a user id,
and `secret_key`, like a token.
There are a few options for passing these credentials:
a `access_key`, which is like a user id, and `secret_key`, which is like a token
or password. There are a few options for passing these credentials:

1. Include them in the URI, like `s3://access_key:secret_key@bucket-name/path/to/file`. Be sure to [URL-encode](https://en.wikipedia.org/wiki/Percent-encoding) your secrets if they contain special characters like "/".
- Include them in the URI, like `s3://access_key:secret_key@bucket-name/path/to/file`. Be sure to [URL-encode](https://en.wikipedia.org/wiki/Percent-encoding) your secrets if they contain special characters like "/" (e.g., `URLencode("123/456", reserved = TRUE)`).

2. Pass them as `access_key` and `secret_key` to `S3FileSystem$create()` or `s3_bucket()`
- Pass them as `access_key` and `secret_key` to `S3FileSystem$create()` or `s3_bucket()`

3. Set them as environment variables named `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`, respectively.
- Set them as environment variables named `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`, respectively.

4. Define them in a `~/.aws/credentials` file, according to the [AWS documentation](https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/credentials.html).
- Define them in a `~/.aws/credentials` file, according to the [AWS documentation](https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/credentials.html).

You can also use an [AccessRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html)
- Use an [AccessRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html)
for temporary access by passing the `role_arn` identifier to `S3FileSystem$create()` or `s3_bucket()`.

## Using a proxy server

If you need to use a proxy server to connect to an S3 bucket, you can provide
a URI in the form `http://user:password@host:port` to `proxy_options`. For
example, a local proxy server running on port 1316 can be used like this:

```r
bucket <- s3_bucket("ursa-labs-taxi-data", proxy_options = "http://localhost:1316")
```

## File systems that emulate S3

The `S3FileSystem` machinery enables you to work with any file system that
Expand Down