-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow version: 1.10.*
Environment:
- Cloud provider or hardware configuration: GCP
- OS (e.g. from /etc/os-release):
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
VERSION_CODENAME=stretch
ID=debian
What happened:
When using the BigQueryToGCSOperator in providers.google.cloud.transfers.bigquery_to_gcs with a task defined as below, despite the task running successfully, no file appears in the GCS bucket and it doesn’t appear to create a BigQuery jobid; just lots of deprecation warnings.
bqtogcs = BigQueryToGCSOperator(
task_id='bqtest',
gcp_conn_id=my_gcp_connid,
source_project_dataset_table='my-project.my_dataset.my_table',
destination_cloud_storage_uris=['gs://my_bucket/my_file.csv'],
export_format='CSV',
print_header=False
)
airflow@540d9822d645:~$ airflow test bqtogcs_test bqtest 2020-07-19
[2020-07-20 15:20:34,996] {{init.py:50}} INFO - Using executor LocalExecutor
[2020-07-20 15:20:34,999] {{dagbag.py:396}} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2020-07-20 15:20:49,302] {{taskinstance.py:670}} INFO - Dependencies all met for <TaskInstance: bqtogcs_test.bqtest 2020-07-19T00:00:00+00:00 [None]>
[2020-07-20 15:20:49,314] {{taskinstance.py:880}} INFO -
[2020-07-20 15:20:49,314] {{taskinstance.py:881}} INFO - Starting attempt 1 of 4
[2020-07-20 15:20:49,314] {{taskinstance.py:882}} INFO -
[2020-07-20 15:20:49,317] {{taskinstance.py:901}} INFO - Executing <Task(BigQueryToGCSOperator): bqtest> on 2020-07-19T00:00:00+00:00
[2020-07-20 15:20:49,344] {{bigquery_to_gcs.py:112}} INFO - Executing extract of my-project.my_dataset.my_table into: ['gs://my_bucket/']
/usr/local/lib/python3.7/site-packages/airflow/providers/google/cloud/transfers/bigquery_to_gcs.py:115: DeprecationWarning: The bigquery_conn_id parameter has been deprecated. You should pass the gcp_conn_id parameter.
location=self.location)
/usr/local/lib/python3.7/site-packages/airflow/providers/google/cloud/hooks/bigquery.py:103: DeprecationWarning: This method will be deprecated. Please useBigQueryHook.get_clientmethod
DeprecationWarning
/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py:984: DeprecationWarning: This method is deprecated. Please useairflow.providers.google.cloud.hooks.bigquery.BigQueryHook.run_extract
result = task_copy.execute(context=context)
/usr/local/lib/python3.7/site-packages/airflow/providers/google/cloud/hooks/bigquery.py:1890: DeprecationWarning: This method is deprecated. Please useBigQueryHook.insert_jobmethod.
DeprecationWarning
[2020-07-20 15:20:49,739] {{taskinstance.py:1070}} INFO - Marking task as SUCCESS.dag_id=bqtogcs_test, task_id=bqtest, execution_date=20200719T000000, start_date=20200720T152049, end_date=20200720T152049
What you expected to happen:
I expected to see a CSV file in the desired GCS bucket.
I think its because the job doesn't actually get run in hook.run_extract. Looking at other methods, I think that there should also be a job.result() here.
How to reproduce it:
Simply create a task based on this operator as defined above.