Skip to content

InvalidTimezone exception if DAG's start_date timezone is "+00:00" #16613

@ecerulm

Description

@ecerulm

Apache Airflow version: 2.0.2

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

What happened:

from airflow.models import DAG
from airflow.serialization.serialized_objects import SerializedDAG
import pendulum
dag_start_date = pendulum.parse("2019-08-01T00:00:00.000+00:00")
dag = DAG(dag_id='simple_dag', start_date=dag_start_date)
serialized_dag = SerializedDAG.to_dict(dag)
serialized_dag['dag']['timezone'] # '+00:00'
dag = SerializedDAG.from_dict(serialized_dag) # raises InvalidTimezone exception

Traceback (most recent call last):
  File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/zoneinfo/reader.py", line 50, in read_for
    file_path = pytzdata.tz_path(timezone)
  File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pytzdata/__init__.py", line 74, in tz_path
    raise TimezoneNotFound('Timezone {} not found at {}'.format(name, filepath))
pytzdata.exceptions.TimezoneNotFound: Timezone +00:00 not found at /Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pytzdata/zoneinfo/+00:00
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-115-6c89c12cbb11>", line 1, in <module>
    dag = SerializedDAG.from_dict(serialized_dag)
  File "/Users/rubelagu/git/airflow/airflow/serialization/serialized_objects.py", line 795, in from_dict
    return cls.deserialize_dag(serialized_obj['dag'])
  File "/Users/rubelagu/git/airflow/airflow/serialization/serialized_objects.py", line 722, in deserialize_dag
    v = cls._deserialize_timezone(v)
  File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/__init__.py", line 37, in timezone
    tz = _Timezone(name, extended=extended)
  File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/timezone.py", line 40, in __init__
    tz = read(name, extend=extended)
  File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/zoneinfo/__init__.py", line 9, in read
    return Reader(extend=extend).read_for(name)
  File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/zoneinfo/reader.py", line 52, in read_for
    raise InvalidTimezone(timezone)
pendulum.tz.zoneinfo.exceptions.InvalidTimezone: Invalid timezone "+00:00"

What you expected to happen:

The DAG holds a reference to the DAG's start_date.tzinfo and it will serialize as +00:00 (this can be checked with serialized_dag['dag']['timezone'], then when it's time to deserialize that it will try to do pendulum.timezone('+00:00)which raises aInvalidTimezone` exception.

In principle I would expect to be able to provide any datetime as start_date , and +00:00 is common. The serialization/deserialization will be used in normal airflow operation so that any DAG with that kind of start_date will give exceptions.

Probably the dag.timezone should not be serialized at all and it should be reconstructed at deserialization time from start_date.

Refactor this section of airflow/models/dag.py::DAG.init() into a method that can be called from both DAG.__init__ and SerializedDAG.from_dict. That way the problem of serialize/deserialize a pendulum.timezone would be avoided.

How to reproduce it:

Anything else we need to know:

Related issue: #16551
Related PR: #16599

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions