Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/python/api/filesystems.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ Filesystem Implementations

LocalFileSystem
S3FileSystem
GcsFileSystem
HadoopFileSystem
SubTreeFileSystem

Expand Down
35 changes: 35 additions & 0 deletions docs/source/python/filesystems.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ Pyarrow implements natively the following filesystem subclasses:

* :ref:`filesystem-localfs` (:class:`LocalFileSystem`)
* :ref:`filesystem-s3` (:class:`S3FileSystem`)
* :ref:`filesystem-gcs` (:class:`GcsFileSystem`)
* :ref:`filesystem-hdfs` (:class:`HadoopFileSystem`)

It is also possible to use your own fsspec-compliant filesystem with pyarrow functionalities as described in the section :ref:`filesystem-fsspec`.
Expand Down Expand Up @@ -183,6 +184,40 @@ Example how you can read contents from a S3 bucket::
for the different ways to configure the AWS credentials.


.. _filesystem-gcs:

Google Cloud Storage File System
--------------------------------

PyArrow implements natively a Google Cloud Storage (GCS) backed file system
for GCS storage.

If not running on Google Cloud Platform (GCP), this generally requires the
environment variable ``GOOGLE_APPLICATION_CREDENTIALS`` to point to a
JSON file containing credentials.

Example showing how you can read contents from a GCS bucket::

>>> from datetime import timedelta
>>> from pyarrow import fs
>>> gcs = fs.GcsFileSystem(anonymous=True, retry_time_limit=timedelta(seconds=15))

# List all contents in a bucket, recursively
>>> uri = "gcp-public-data-landsat/LC08/01/001/003/"
>>> file_list = gcs.get_file_info(fs.FileSelector(uri, recursive=True))

# Open a file for reading and download its contents
>>> f = gcs.open_input_stream(file_list[0].path)
>>> f.read(64)
b'GROUP = FILE_HEADER\n LANDSAT_SCENE_ID = "LC80010032013082LGN03"\n S'

.. seealso::

The :class:`GcsFileSystem` constructor by default uses the
process described in `GCS docs <https://google.aip.dev/auth/4110>`__
to resolve credentials.


.. _filesystem-hdfs:

Hadoop Distributed File System (HDFS)
Expand Down