[Python] Error with errno 22 when loading 3.6 GB Parquet file

I saved a file using pandas to_parquet method, but can't read it back in. Here's the full stack trace:

 
```java

Traceback (most recent call last):
File "src/data/CLXP_pull.py", line 214, in <module>
 main()
 File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", line 722, in _call_
 return self.main(*args, **kwargs)
 File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", line 697, in main
 rv = self.invoke(ctx)
 File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", line 895, in invoke
 return ctx.invoke(self.callback, **ctx.params)
 File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", line 535, in invoke
 return callback(*args, **kwargs)
 File "src/data/CLXP_pull.py", line 188, in main
 results[fullname] = pd.read_parquet(os.path.join(project_dir, "data", "raw", fullname+".parquet"), engine="pyarrow")
 File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py", line 257, in read_parquet
 return impl.read(path, columns=columns, **kwargs)
 File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py", line 130, in read
 **kwargs).to_pandas()
 File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py", line 939, in read_table
 pf = ParquetFile(source, metadata=metadata)
 File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py", line 64, in _init_
 self.reader.open(source, metadata=metadata)
 File "_parquet.pyx", line 651, in pyarrow._parquet.ParquetReader.open
 File "error.pxi", line 79, in pyarrow.lib.check_status
 pyarrow.lib.ArrowIOError: Arrow error: IOError: [Errno 22] Invalid argument
```
Any ideas what could cause this? The file itself is 3.6GB.

I'm running pandas==0.22.0.

**Reporter**: [Andy Reagan](https://issues.apache.org/jira/browse/ARROW-2654)
**Assignee**: [Wes McKinney](https://issues.apache.org/jira/browse/ARROW-2654) / @wesm
#### Related issues:
- [[C++] Parquet arrow::Table reads error when overflowing capacity of BinaryArray](https://github.com/apache/arrow/issues/20081) (duplicates)

<sub>**Note**: *This issue was originally created as [ARROW-2654](https://issues.apache.org/jira/browse/ARROW-2654). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Error with errno 22 when loading 3.6 GB Parquet file #19048

Related issues:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python] Error with errno 22 when loading 3.6 GB Parquet file #19048

Description

Related issues:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions