Skip to content

[Python] Cast all timestamp resolutions to INT96 use_deprecated_int96_timestamps=True #18007

@asfimport

Description

@asfimport

When writing to a Parquet file, if use_deprecated_int96_timestamps is True, timestamps are only written as 96-bit integers if the timestamp has nanosecond resolution. This is a problem because Amazon Redshift timestamps only have microsecond resolution but require them to be stored in 96-bit format in Parquet files.

I'd expect the use_deprecated_int96_timestamps flag to cause all timestamps to be written as 96 bits, regardless of resolution. If this is a deliberate design decision, it'd be immensely helpful if it were explicitly documented as part of the argument.

 

To reproduce:

 

1. Create a table with a timestamp having microsecond or millisecond resolution, and save it to a Parquet file. Be sure to set use_deprecated_int96_timestamps to True.

 

import datetime
import pyarrow
from pyarrow import parquet

schema = pyarrow.schema([
    pyarrow.field('last_updated', pyarrow.timestamp('us')),
])

data = [
    pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('us')),
]

table = pyarrow.Table.from_arrays(data, ['last_updated'])

with open('test_file.parquet', 'wb') as fdesc:
    parquet.write_table(table, fdesc,
                        use_deprecated_int96_timestamps=True)

 

2. Inspect the file. I used parquet-tools:

 


dak@tux ~ $ parquet-tools meta test_file.parquet
file:         file:/Users/dak/test_file.parquet

creator:      parquet-cpp version 1.3.2-SNAPSHOT



file schema:  schema

--------------------------------------------------------------------------------

last_updated: OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1



row group 1:  RC:1 TS:76 OFFSET:4

--------------------------------------------------------------------------------

last_updated:  INT64 SNAPPY DO:4 FPO:28 SZ:76/72/0.95 VC:1 ENC:PLAIN,PLAIN_DICTIONARY,RLE

 

Environment: OS: Mac OS X 10.13.2
Python: 3.6.4
PyArrow: 0.8.0
Reporter: Diego Argueta / @dargueta
Assignee: Francois Saint-Jacques / @fsaintjacques

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-2026. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions