@@ -3951,29 +3951,50 @@ The :mod:`pandas.io.gbq` module provides a wrapper for Google's BigQuery
39513951analytics web service to simplify retrieving results from BigQuery tables
39523952using SQL-like queries. Result sets are parsed into a pandas
39533953DataFrame with a shape and data types derived from the source table.
3954- Additionally, DataFrames can be appended to existing BigQuery tables if
3955- the destination table is the same shape as the DataFrame .
3954+ Additionally, DataFrames can be inserted into new BigQuery tables or appended
3955+ to existing tables .
39563956
3957- For specifics on the service itself, see `here <https://developers.google.com/bigquery/ >`__
3957+ .. warning ::
3958+
3959+ To use this module, you will need a valid BigQuery account. Refer to the
3960+ `BigQuery Documentation <https://developers.google.com/bigquery/ >`__ for details on the service itself.
3961+
3962+ The key functions are:
39583963
3959- As an example, suppose you want to load all data from an existing BigQuery
3960- table : `test_dataset.test_table ` into a DataFrame using the :func: `~pandas.io.read_gbq `
3961- function.
3964+ .. currentmodule :: pandas.io.gbq
3965+
3966+ .. autosummary ::
3967+ :toctree: generated/
3968+
3969+ read_gbq
3970+ to_gbq
3971+ generate_bq_schema
3972+ create_table
3973+ delete_table
3974+ table_exists
3975+
3976+ .. currentmodule :: pandas
3977+
3978+ Querying
3979+ ''''''''
3980+
3981+ Suppose you want to load all data from an existing BigQuery table : `test_dataset.test_table `
3982+ into a DataFrame using the :func: `~pandas.io.gbq.read_gbq ` function.
39623983
39633984.. code-block :: python
39643985
39653986 # Insert your BigQuery Project ID Here
39663987 # Can be found in the Google web console
39673988 projectid = " xxxxxxxx"
39683989
3969- data_frame = pd.read_gbq(' SELECT * FROM test_dataset.test_table' , project_id = projectid)
3990+ data_frame = pd.read_gbq(' SELECT * FROM test_dataset.test_table' , projectid)
39703991
39713992 You will then be authenticated to the specified BigQuery account
39723993via Google's Oauth2 mechanism. In general, this is as simple as following the
39733994prompts in a browser window which will be opened for you. Should the browser not
39743995be available, or fail to launch, a code will be provided to complete the process
39753996manually. Additional information on the authentication mechanism can be found
3976- `here <https://developers.google.com/accounts/docs/OAuth2#clientside/ >`__
3997+ `here <https://developers.google.com/accounts/docs/OAuth2#clientside/ >`__.
39773998
39783999You can define which column from BigQuery to use as an index in the
39794000destination DataFrame as well as a preferred column order as follows:
@@ -3982,56 +4003,167 @@ destination DataFrame as well as a preferred column order as follows:
39824003
39834004 data_frame = pd.read_gbq(' SELECT * FROM test_dataset.test_table' ,
39844005 index_col = ' index_column_name' ,
3985- col_order = [' col1' , ' col2' , ' col3' ], project_id = projectid)
3986-
3987- Finally, you can append data to a BigQuery table from a pandas DataFrame
3988- using the :func: `~pandas.io.to_gbq ` function. This function uses the
3989- Google streaming API which requires that your destination table exists in
3990- BigQuery. Given the BigQuery table already exists, your DataFrame should
3991- match the destination table in column order, structure, and data types.
3992- DataFrame indexes are not supported. By default, rows are streamed to
3993- BigQuery in chunks of 10,000 rows, but you can pass other chuck values
3994- via the ``chunksize `` argument. You can also see the progess of your
3995- post via the ``verbose `` flag which defaults to ``True ``. The http
3996- response code of Google BigQuery can be successful (200) even if the
3997- append failed. For this reason, if there is a failure to append to the
3998- table, the complete error response from BigQuery is returned which
3999- can be quite long given it provides a status for each row. You may want
4000- to start with smaller chunks to test that the size and types of your
4001- dataframe match your destination table to make debugging simpler.
4006+ col_order = [' col1' , ' col2' , ' col3' ], projectid)
4007+
4008+ .. note ::
4009+
4010+ You can find your project id in the `BigQuery management console <https://code.google.com/apis/console/b/0/?noredirect >`__.
4011+
4012+
4013+ .. note ::
4014+
4015+ You can toggle the verbose output via the ``verbose `` flag which defaults to ``True ``.
4016+
4017+ Writing DataFrames
4018+ ''''''''''''''''''
4019+
4020+ Assume we want to write a DataFrame ``df `` into a BigQuery table using :func: `~pandas.DataFrame.to_gbq `.
4021+
4022+ .. ipython :: python
4023+
4024+ df = pd.DataFrame({' my_string' : list (' abc' ),
4025+ ' my_int64' : list (range (1 , 4 )),
4026+ ' my_float64' : np.arange(4.0 , 7.0 ),
4027+ ' my_bool1' : [True , False , True ],
4028+ ' my_bool2' : [False , True , False ],
4029+ ' my_dates' : pd.date_range(' now' , periods = 3 )})
4030+
4031+ df
4032+ df.dtypes
40024033
40034034 .. code-block :: python
40044035
4005- df = pandas.DataFrame({' string_col_name' : [' hello' ],
4006- ' integer_col_name' : [1 ],
4007- ' boolean_col_name' : [True ]})
4008- df.to_gbq(' my_dataset.my_table' , project_id = projectid)
4036+ df.to_gbq(' my_dataset.my_table' , projectid)
4037+
4038+ .. note ::
4039+
4040+ If the destination table does not exist, a new table will be created. The
4041+ destination dataset id must already exist in order for a new table to be created.
4042+
4043+ The ``if_exists `` argument can be used to dictate whether to ``'fail' ``, ``'replace' ``
4044+ or ``'append' `` if the destination table already exists. The default value is ``'fail' ``.
4045+
4046+ For example, assume that ``if_exists `` is set to ``'fail' ``. The following snippet will raise
4047+ a ``TableCreationError `` if the destination table already exists.
4048+
4049+ .. code-block :: python
40094050
4010- The BigQuery SQL query language has some oddities, see ` here < https://developers.google.com/bigquery/query-reference >`__
4051+ df.to_gbq( ' my_dataset.my_table ' , projectid, if_exists = ' fail ' )
40114052
4012- While BigQuery uses SQL-like syntax, it has some important differences
4013- from traditional databases both in functionality, API limitations (size and
4014- quantity of queries or uploads), and how Google charges for use of the service.
4015- You should refer to Google documentation often as the service seems to
4016- be changing and evolving. BiqQuery is best for analyzing large sets of
4017- data quickly, but it is not a direct replacement for a transactional database.
4053+ .. note ::
40184054
4019- You can access the management console to determine project id's by:
4020- <https://code.google.com/apis/console/b/0/?noredirect>
4055+ If the ``if_exists `` argument is set to ``'append' ``, the destination dataframe will
4056+ be written to the table using the defined table schema and column types. The
4057+ dataframe must match the destination table in column order, structure, and
4058+ data types.
4059+ If the ``if_exists `` argument is set to ``'replace' ``, and the existing table has a
4060+ different schema, a delay of 2 minutes will be forced to ensure that the new schema
4061+ has propagated in the Google environment. See
4062+ `Google BigQuery issue 191 <https://code.google.com/p/google-bigquery/issues/detail?id=191 >`__.
40214063
4022- As of 0.15.2, the gbq module has a function ``generate_bq_schema `` which
4023- will produce the dictionary representation of the schema.
4064+ Writing large DataFrames can result in errors due to size limitations being exceeded.
4065+ This can be avoided by setting the ``chunksize `` argument when calling :func: `~pandas.DataFrame.to_gbq `.
4066+ For example, the following writes ``df `` to a BigQuery table in batches of 10000 rows at a time:
40244067
40254068.. code-block :: python
40264069
4027- df = pandas.DataFrame({' A' : [1.0 ]})
4028- gbq.generate_bq_schema(df, default_type = ' STRING' )
4070+ df.to_gbq(' my_dataset.my_table' , projectid, chunksize = 10000 )
40294071
4030- .. warning ::
4072+ You can also see the progress of your post via the ``verbose `` flag which defaults to ``True ``.
4073+ For example:
4074+
4075+ .. code-block :: python
4076+
4077+ In [8 ]: df.to_gbq(' my_dataset.my_table' , projectid, chunksize = 10000 , verbose = True )
4078+
4079+ Streaming Insert is 10 % Complete
4080+ Streaming Insert is 20 % Complete
4081+ Streaming Insert is 30 % Complete
4082+ Streaming Insert is 40 % Complete
4083+ Streaming Insert is 50 % Complete
4084+ Streaming Insert is 60 % Complete
4085+ Streaming Insert is 70 % Complete
4086+ Streaming Insert is 80 % Complete
4087+ Streaming Insert is 90 % Complete
4088+ Streaming Insert is 100 % Complete
4089+
4090+ .. note ::
4091+
4092+ If an error occurs while streaming data to BigQuery, see
4093+ `Troubleshooting BigQuery Errors <https://cloud.google.com/bigquery/troubleshooting-errors >`__.
4094+
4095+ .. note ::
4096+
4097+ The BigQuery SQL query language has some oddities, see the
4098+ `BigQuery Query Reference Documentation <https://developers.google.com/bigquery/query-reference >`__.
4099+
4100+ .. note ::
4101+
4102+ While BigQuery uses SQL-like syntax, it has some important differences from traditional
4103+ databases both in functionality, API limitations (size and quantity of queries or uploads),
4104+ and how Google charges for use of the service. You should refer to `Google BigQuery documentation <https://developers.google.com/bigquery/ >`__
4105+ often as the service seems to be changing and evolving. BiqQuery is best for analyzing large
4106+ sets of data quickly, but it is not a direct replacement for a transactional database.
4107+
4108+
4109+ Creating BigQuery Tables
4110+ ''''''''''''''''''''''''
4111+
4112+ As of 0.17.0, the gbq module has a function :func: `~pandas.io.gbq.create_table ` which allows users
4113+ to create a table in BigQuery. The only requirement is that the dataset must already exist.
4114+ The schema may be generated from a pandas DataFrame using the :func: `~pandas.io.gbq.generate_bq_schema ` function below.
4115+
4116+ For example:
4117+
4118+ .. code-block :: python
4119+
4120+ gbq.create_table(' my_dataset.my_table' , schema, projectid)
4121+
4122+ As of 0.15.2, the gbq module has a function :func: `~pandas.io.gbq.generate_bq_schema ` which will
4123+ produce the dictionary representation schema of the specified pandas DataFrame.
4124+
4125+ .. code-block :: python
4126+
4127+ In [10 ]: gbq.generate_bq_schema(df, default_type = ' STRING' )
4128+
4129+ Out[10 ]: {' fields' : [{' name' : ' my_bool1' , ' type' : ' BOOLEAN' },
4130+ {' name' : ' my_bool2' , ' type' : ' BOOLEAN' },
4131+ {' name' : ' my_dates' , ' type' : ' TIMESTAMP' },
4132+ {' name' : ' my_float64' , ' type' : ' FLOAT' },
4133+ {' name' : ' my_int64' , ' type' : ' INTEGER' },
4134+ {' name' : ' my_string' , ' type' : ' STRING' }]}
4135+
4136+ Deleting BigQuery Tables
4137+ ''''''''''''''''''''''''
4138+
4139+ As of 0.17.0, the gbq module has a function :func: `~pandas.io.gbq.delete_table ` which allows users to delete a table
4140+ in Google BigQuery.
4141+
4142+ For example:
4143+
4144+ .. code-block :: python
4145+
4146+ gbq.delete_table(' my_dataset.my_table' , projectid)
4147+
4148+ The following function can be used to check whether a table exists prior to calling ``table_exists ``:
4149+
4150+ :func: `~pandas.io.gbq.table_exists `.
4151+
4152+ The return value will be of type boolean.
4153+
4154+ For example:
4155+
4156+ .. code-block :: python
4157+
4158+ In [12 ]: gbq.table_exists(' my_dataset.my_table' , projectid)
4159+ Out[12 ]: True
4160+
4161+ .. note ::
40314162
4032- To use this module, you will need a valid BigQuery account. See
4033- <https://cloud.google.com/products/big-query> for details on the
4034- service.
4163+ If you delete and re-create a BigQuery table with the same name, but different table schema,
4164+ you must wait 2 minutes before streaming data into the table. As a workaround, consider creating
4165+ the new table with a different name. Refer to
4166+ `Google BigQuery issue 191 <https://code.google.com/p/google-bigquery/issues/detail?id=191 >`__.
40354167
40364168.. _io.stata :
40374169
0 commit comments