Skip to content

[Python] Stop initializing s3 upon import #38364

@vyasr

Description

@vyasr

Describe the enhancement requested

Currently pyarrow initializes the s3 filesystem when pyarrow.fs is imported. This leads to AWS consuming resources on startup that may never be used if the user is not actually taking advantage of that support. Ideally the s3fs would instead be delayed to first use to avoid AWS spinning up unnecessary threads/doing work on pyarrow import.

Making this change would also allow sidestepping a bug present in newer versions of the aws-sdk-cpp that occasionally leads to segfaults simply by using the AWS APIs, at least for the majority of users who are not using the s3fs by default.

Component(s)

C++, Python

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions