-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
[WIP] Add remote file io using fsspec. #33549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add remote file io using fsspec. #33549
Conversation
| import fsspec | ||
| scheme = parse_url(filepath_or_buffer).scheme | ||
| filesystem = fsspec.filesystem(scheme) | ||
| file_obj = filesystem.open(filepath_or_buffer, mode=mode or "rb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to pass through the encoding? to .open() You will potentially also fix this if you do so: #26124 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using filesystem.open directly, I would for now always open binary and use the existing encoding within pandas.
TomAugspurger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you been able to test this out manually? Do things seem OK?
| if is_s3_url(filepath_or_buffer): | ||
| from pandas.io import s3 | ||
| if is_fsspec_url(filepath_or_buffer): | ||
| import fsspec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be import_optional_dependency("fsspec"). Make sure we have a nice error message on the failure.
| return s3.get_filepath_or_buffer( | ||
| filepath_or_buffer, encoding=encoding, compression=compression, mode=mode | ||
| ) | ||
| # if is_s3_url(filepath_or_buffer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can delete all these.
|
|
||
| try: | ||
| from fsspec.registry import known_implementations | ||
| scheme = parse_url(url).scheme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps split_protocol, or even just "://" in url or "::" in url
| from pandas.io import s3 | ||
| if is_fsspec_url(filepath_or_buffer): | ||
| import fsspec | ||
| scheme = parse_url(filepath_or_buffer).scheme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three lines can be done with fsspec.open, except for the garbage collection issue, which should take care of encoding and compression too.
| import fsspec | ||
| scheme = parse_url(filepath_or_buffer).scheme | ||
| filesystem = fsspec.filesystem(scheme) | ||
| file_obj = filesystem.open(filepath_or_buffer, mode=mode or "rb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using filesystem.open directly, I would for now always open binary and use the existing encoding within pandas.
|
@jrderuiter , is there any way in which I can help here? |
|
ping ? |
jreback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to add fsspec to several of the ci files but not all and environment.yaml
to have testing and proper skipping
| return urllib.request.urlopen(*args, **kwargs) | ||
|
|
||
|
|
||
| def is_fsspec_url(url) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u type url
| fsspec filesystem. | ||
| """ | ||
|
|
||
| if not isinstance(url, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don’t need this
|
Closing in favor of #34266. |
black pandasgit diff upstream/master -u -- "*.py" | flake8 --diff