Skip to content

Type error in naming a protected attribute in class PyArrowFile(OutputFile, InputFile), breaking readability from HDFS using PyArrow #654

@SebastianoMeneghin

Description

@SebastianoMeneghin

Apache Iceberg version

0.6.0 (latest release)

Please describe the bug 🐞

The same error is present also in 0.5.X and in main

In the file ../pyiceberg/io/pyarrow.py at line 183 is defined the class PyArrowFile.
It inherits from OutputFile and InputFile.

In line 205, you describe a protected attribute, that is never used in the following lines

line 205 _fs: FileSystem

However, in the following part of the code of that class, you often access to the protected attribute _filesystem which is however never specified, neither in the class PyArrowFile nor in its parents.
I think this is causing some issues, while trying to use PyArrowFileIO to access files (I am trying it with HDFS as storage and SQL Lite as catalog).

line 209
def __init__(self, location: str, path: str, fs: FileSystem, buffer_size: int = ONE_MEGABYTE):
self._filesystem = fs
self._path = path
self._buffer_size = buffer_size
super().__init__(location=location)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions