-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
api: bigqueryIssues related to the BigQuery API.Issues related to the BigQuery API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Description
As pointed out in #9457, creating a GAPIC client and not closing the client's transport's channel before letting the client get garbage collected means we leak sockets / file descriptors.
Steps to reproduce
- Start a Jupyter Notebook (or launch the code example with ipython).
- Load the magics with
%load_ext google.cloud.bigquery - Run a
%%bigquerymagic command. - Observe with
psutilthat open connections are not closed.
Code example
Notebook as Markdown:
import psutil
from google.cloud import bigquerycurrent_process = psutil.Process()num_conns = len(current_process.connections())
print("connections before loading magics: {}".format(num_conns))connections before loading magics: 12
%load_ext google.cloud.bigquerynum_conns = len(current_process.connections())
print("connections after loading magics: {}".format(num_conns))connections after loading magics: 12
%%bigquery --use_bqstorage_api
SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 15| year | birth_count | |
|---|---|---|
| 0 | 2008 | 4255156 |
| 1 | 2007 | 4324008 |
| 2 | 2006 | 4273225 |
| 3 | 2005 | 4145619 |
| 4 | 2004 | 4118907 |
| 5 | 2003 | 4096092 |
| 6 | 2002 | 4027376 |
| 7 | 2001 | 4031531 |
| 8 | 2000 | 4063823 |
| 9 | 1999 | 3963465 |
| 10 | 1998 | 3945192 |
| 11 | 1997 | 3884329 |
| 12 | 1996 | 3894874 |
| 13 | 1995 | 3903012 |
| 14 | 1994 | 3956925 |
num_conns = len(current_process.connections())
print("connections after running magics: {}".format(num_conns))connections after running magics: 16
%%bigquery --use_bqstorage_api
SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 15| year | birth_count | |
|---|---|---|
| 0 | 2008 | 4255156 |
| 1 | 2007 | 4324008 |
| 2 | 2006 | 4273225 |
| 3 | 2005 | 4145619 |
| 4 | 2004 | 4118907 |
| 5 | 2003 | 4096092 |
| 6 | 2002 | 4027376 |
| 7 | 2001 | 4031531 |
| 8 | 2000 | 4063823 |
| 9 | 1999 | 3963465 |
| 10 | 1998 | 3945192 |
| 11 | 1997 | 3884329 |
| 12 | 1996 | 3894874 |
| 13 | 1995 | 3903012 |
| 14 | 1994 | 3956925 |
num_conns = len(current_process.connections())
print("connections after running magics: {}".format(num_conns))connections after running magics: 20
Full example:
import psutil
from google.cloud import bigquery
current_process = psutil.Process()
num_conns = len(current_process.connections())
print("connections before loading magics: {}".format(num_conns))
get_ipython().run_line_magic('load_ext', 'google.cloud.bigquery')
num_conns = len(current_process.connections())
print("connections after loading magics: {}".format(num_conns))
get_ipython().run_cell_magic('bigquery', '--use_bqstorage_api', 'SELECT\n source_year AS year,\n COUNT(is_male) AS birth_count\nFROM `bigquery-public-data.samples.natality`\nGROUP BY year\nORDER BY year DESC\nLIMIT 15')
num_conns = len(current_process.connections())
print("connections after running magics: {}".format(num_conns))
get_ipython().run_cell_magic('bigquery', '--use_bqstorage_api', 'SELECT\n source_year AS year,\n COUNT(is_male) AS birth_count\nFROM `bigquery-public-data.samples.natality`\nGROUP BY year\nORDER BY year DESC\nLIMIT 15')
num_conns = len(current_process.connections())
print("connections after running magics: {}".format(num_conns))Stack trace
N/A
Suggested fix
As identified in #9457, we need to close the bqstorage_client.transport.channel, since we create a new BQ Storage client each time.
I suggest we also add psutil as a test-only dependency and verify in a system test of google.cloud.bigquery.magics._cell_magic that there are no additional open connections after running the cell magic.
Metadata
Metadata
Assignees
Labels
api: bigqueryIssues related to the BigQuery API.Issues related to the BigQuery API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.