Skip to content

Discussion: how to handle the new Int64 (nullable integer) dtype with pandas 0.24.0 #242

@tswast

Description

@tswast

Currently unreleased, but pandas 0.24.0 will add an extension dtype to allow a nullable integer dtype: http://pandas-docs.github.io/pandas-docs-travis/integer_na.html#integer-na Unfortunately, we won't use it with our current logic of deferring to the DataFrame constructor for type inference.

It [Int64, nullable integer] is not the default dtype for integers, and will not be inferred; you must explicitly pass the dtype into array() or Series.

The question is how can we support this dtype in pandas-gbq? I see a few options.

  • Use pd.Int64Dtype() by default for nullable integer columns, similar to how previously pandas-gbq defaulted to string for integer columns.
    • Con: ties new versions of pandas-gbq to 0.24.0+
  • Use pd.Int64Dtype() for nullable integer columns when pandas-gbq 0.24.0+ is installed.
    • Con: inconsistent with pandas.
    • Con: unable to turn this feature off when float is desired (perhaps for performance reasons).
  • Add an argument to read_gbq which is a map of column names to dtypes, overriding the dtype of any column present.
    • Con: float isn't the safest default for nullable integer columns, but at least it's consistent with pandas.
    • Con: will require reading rows into separate Series before constructing a DataFrame, as the DataFrame constructor only accepts a single dtype.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions