-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow version: 2.0.2
Kubernetes version (if you are using kubernetes) (use kubectl version):
Environment:
- Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a): - Install tools:
- Others:
What happened:
from airflow.models import DAG
from airflow.serialization.serialized_objects import SerializedDAG
import pendulum
dag_start_date = pendulum.parse("2019-08-01T00:00:00.000+00:00")
dag = DAG(dag_id='simple_dag', start_date=dag_start_date)
serialized_dag = SerializedDAG.to_dict(dag)
serialized_dag['dag']['timezone'] # '+00:00'
dag = SerializedDAG.from_dict(serialized_dag) # raises InvalidTimezone exception
Traceback (most recent call last):
File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/zoneinfo/reader.py", line 50, in read_for
file_path = pytzdata.tz_path(timezone)
File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pytzdata/__init__.py", line 74, in tz_path
raise TimezoneNotFound('Timezone {} not found at {}'.format(name, filepath))
pytzdata.exceptions.TimezoneNotFound: Timezone +00:00 not found at /Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pytzdata/zoneinfo/+00:00
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-115-6c89c12cbb11>", line 1, in <module>
dag = SerializedDAG.from_dict(serialized_dag)
File "/Users/rubelagu/git/airflow/airflow/serialization/serialized_objects.py", line 795, in from_dict
return cls.deserialize_dag(serialized_obj['dag'])
File "/Users/rubelagu/git/airflow/airflow/serialization/serialized_objects.py", line 722, in deserialize_dag
v = cls._deserialize_timezone(v)
File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/__init__.py", line 37, in timezone
tz = _Timezone(name, extended=extended)
File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/timezone.py", line 40, in __init__
tz = read(name, extend=extended)
File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/zoneinfo/__init__.py", line 9, in read
return Reader(extend=extend).read_for(name)
File "/Users/rubelagu/.pyenv/versions/airflow-venv/lib/python3.8/site-packages/pendulum/tz/zoneinfo/reader.py", line 52, in read_for
raise InvalidTimezone(timezone)
pendulum.tz.zoneinfo.exceptions.InvalidTimezone: Invalid timezone "+00:00"
What you expected to happen:
The DAG holds a reference to the DAG's start_date.tzinfo and it will serialize as +00:00 (this can be checked with serialized_dag['dag']['timezone'], then when it's time to deserialize that it will try to do pendulum.timezone('+00:00)which raises aInvalidTimezone` exception.
In principle I would expect to be able to provide any datetime as start_date , and +00:00 is common. The serialization/deserialization will be used in normal airflow operation so that any DAG with that kind of start_date will give exceptions.
Probably the dag.timezone should not be serialized at all and it should be reconstructed at deserialization time from start_date.
Refactor this section of airflow/models/dag.py::DAG.init() into a method that can be called from both DAG.__init__ and SerializedDAG.from_dict. That way the problem of serialize/deserialize a pendulum.timezone would be avoided.
How to reproduce it:
Anything else we need to know: