Skip to content

pandas-gbq 0.16 release broke dask-bigquery CI  #24

@ncclementi

Description

@ncclementi

It looks like the most recent update on pandas-gbq might have broken our tests. When writing to bigquery this

pd.DataFrame.to_gbq(
        df,
        destination_table=f"{dataset_id}.{table_id}",
        project_id=project_id,
        chunksize=5,
        if_exists="append",
    )

with pandas-gbq=0.15 and reading it back with dask_bigquery.read_gbqreturns 2 dask partitions, while if the writing is done withpandas-gbq=0.16when reading back withdask_bigquery.read_gbq` returns only 1 dask partitions.

From the discussion on #11 we know that

pandas-gbq 0.16 changed the default intermediate data serialization format to parquet instead of CSV.
Likely this means the backend loader required fewer workers and wrote it to fewer files behind the scenes

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions