From f66c50e1dd3667d886d2f32089b799851e8f6900 Mon Sep 17 00:00:00 2001 From: akshayi1 Date: Wed, 26 Sep 2018 12:04:01 -0400 Subject: [PATCH 1/2] [AIRFLOW-XXX] Update GCS logging docs for latest code Updated documentation for the steps to set up remote logging to Google Cloud Storage. --- docs/howto/write-logs.rst | 64 ++++++++------------------------------- 1 file changed, 12 insertions(+), 52 deletions(-) diff --git a/docs/howto/write-logs.rst b/docs/howto/write-logs.rst index ac30e84f39a05..e0f5b86f2142d 100644 --- a/docs/howto/write-logs.rst +++ b/docs/howto/write-logs.rst @@ -89,54 +89,21 @@ Writing Logs to Google Cloud Storage Follow the steps below to enable Google Cloud Storage logging. -#. Airflow's logging system requires a custom .py file to be located in the ``PYTHONPATH``, so that it's importable from Airflow. Start by creating a directory to store the config file. ``$AIRFLOW_HOME/config`` is recommended. -#. Create empty files called ``$AIRFLOW_HOME/config/log_config.py`` and ``$AIRFLOW_HOME/config/__init__.py``. -#. Copy the contents of ``airflow/config_templates/airflow_local_settings.py`` into the ``log_config.py`` file that was just created in the step above. -#. Customize the following portions of the template: - - .. code-block:: bash - - # Add this variable to the top of the file. Note the trailing slash. - GCS_LOG_FOLDER = 'gs:///' - - # Rename DEFAULT_LOGGING_CONFIG to LOGGING CONFIG - LOGGING_CONFIG = ... - - # Add a GCSTaskHandler to the 'handlers' block of the LOGGING_CONFIG variable - 'gcs.task': { - 'class': 'airflow.utils.log.gcs_task_handler.GCSTaskHandler', - 'formatter': 'airflow.task', - 'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER), - 'gcs_log_folder': GCS_LOG_FOLDER, - 'filename_template': FILENAME_TEMPLATE, - }, - - # Update the airflow.task and airflow.task_runner blocks to be 'gcs.task' instead of 'file.task'. - 'loggers': { - 'airflow.task': { - 'handlers': ['gcs.task'], - ... - }, - 'airflow.task_runner': { - 'handlers': ['gcs.task'], - ... - }, - 'airflow': { - 'handlers': ['console'], - ... - }, - } - -#. Make sure a Google Cloud Platform connection hook has been defined in Airflow. The hook should have read and write access to the Google Cloud Storage bucket defined above in ``GCS_LOG_FOLDER``. - -#. Update ``$AIRFLOW_HOME/airflow.cfg`` to contain: +To enable this feature, ``airflow.cfg`` must be configured as in this +example: - .. code-block:: bash +.. code-block:: bash - task_log_reader = gcs.task - logging_config_class = log_config.LOGGING_CONFIG - remote_log_conn_id = + [core] + # Airflow can store logs remotely in AWS S3. Users must supply a remote + # location URL (starting with either 's3://...') and an Airflow connection + # id that provides access to the storage location. + remote_logging_enabled = True + remote_base_log_folder = gs://my-bucket/path/to/logs + remote_log_conn_id = MyGCSConn +#. Install the ``gcp_api`` package first, like so: ``pip install apache-airflow[gcp_api]``. +#. Make sure a Google Cloud Platform connection hook has been defined in Airflow. The hook should have read and write access to the Google Cloud Storage bucket defined above in ``remote_base_log_folder``. #. Restart the Airflow webserver and scheduler, and trigger (or wait for) a new task execution. #. Verify that logs are showing up for newly executed tasks in the bucket you've defined. #. Verify that the Google Cloud Storage viewer is working in the UI. Pull up a newly executed task, and verify that you see something like: @@ -150,10 +117,3 @@ Follow the steps below to enable Google Cloud Storage logging. [2017-10-03 21:57:51,306] {base_task_runner.py:98} INFO - Subtask: [2017-10-03 21:57:51,306] {models.py:186} INFO - Filling up the DagBag from /airflow/dags/example_dags/example_bash_operator.py Note the top line that says it's reading from the remote log file. - -Please be aware that if you were persisting logs to Google Cloud Storage -using the old-style airflow.cfg configuration method, the old logs will no -longer be visible in the Airflow UI, though they'll still exist in Google -Cloud Storage. This is a backwards incompatbile change. If you are unhappy -with it, you can change the ``FILENAME_TEMPLATE`` to reflect the old-style -log filename format. From 51b7ad7fa2f17127863549c33b6d4253e543cc25 Mon Sep 17 00:00:00 2001 From: akshayi1 Date: Thu, 27 Sep 2018 09:43:55 -0400 Subject: [PATCH 2/2] Updated write-logs.rst Made changes as per Jacob Hayes' review. --- docs/howto/write-logs.rst | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/howto/write-logs.rst b/docs/howto/write-logs.rst index e0f5b86f2142d..272985a7b119a 100644 --- a/docs/howto/write-logs.rst +++ b/docs/howto/write-logs.rst @@ -95,10 +95,11 @@ example: .. code-block:: bash [core] - # Airflow can store logs remotely in AWS S3. Users must supply a remote - # location URL (starting with either 's3://...') and an Airflow connection - # id that provides access to the storage location. - remote_logging_enabled = True + # Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic Search. + # Users must supply an Airflow connection id that provides access to the storage + # location. If remote_logging is set to true, see UPDATING.md for additional + # configuration requirements. + remote_logging = True remote_base_log_folder = gs://my-bucket/path/to/logs remote_log_conn_id = MyGCSConn