-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Fix datetime deserialization by replacing timezone short code by its name in serialized values #34492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix datetime deserialization by replacing timezone short code by its name in serialized values #34492
Conversation
…name in serialized values
| o = convert_to_utc(o) | ||
|
|
||
| tz = o.tzname() | ||
| tz = o.tzinfo.name # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb solution which should work for different types of tzinfo
| tz = o.tzinfo.name # type: ignore | |
| from pendulum.tz.timezone import FixedTimezone, Timezone | |
| tzi = o.tzinfo | |
| if isinstance(tzi, FixedTimezone): | |
| tz = tzi.offset or "UTC" | |
| elif isinstance(tzi, Timezone): | |
| tz = tzi.name | |
| else: | |
| tz = int(o.utcoffset().total_seconds()) or "UTC" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And just for a record, I don't think that we should use local imports for datetime and pendulum in this module because we import airflow.utils.timezone which import entire datetime and some modules from pendulum (at least Timezone/Datetime)
|
We probably should re-use the code in the timezone serializer ( |
|
My suggestion:
|
|
are you up for that @hussein-awala ? |
|
Initially I thought that it would be also a good idea to use reuse serializer however there is one small but important thing. When you have timezone (subclass of However in case if we work with So if we would like to create generic timezone extraction we could create this method in |
I'm not sure if I follow. We already convert naive dates to UTC (pendulum) datetimes. And we explicitly do not support other libraries than Pendulum since the integration of timezones into Airflow. So I don't think your second case makes sense? Refactoring into |
|
@bolkedebruin Sorry for confuse. I meant datetime-aware, but for unknown for me reason write down naive 🤦 🤦 |
|
no worries! maybe we should see a first iteration and then figure out if we want to support non-pendulum timezone aware datetimes. I do not think we should, due to all the issues with the other implementations, but open to discussion. |
Yes, I will try to implement what you both suggested and add some unit tests |
I'm just worry, if we only would serialize
In general you could compare from dateutil.tz import tzutc
from datetime import datetime, timezone, timedelta
from pendulum.tz.timezone import Timezone
from zoneinfo import ZoneInfo
from pendulum.tz import timezone as ptimezone
from psycopg2.tz import FixedOffsetTimezone
from pytz import UTC
d1 = datetime(2021, 1, 1, tzinfo=timezone.utc)
d2 = datetime(2021, 1, 1, tzinfo=Timezone("UTC"))
d3 = datetime(2021, 1, 1, tzinfo=tzutc())
d4 = datetime(2021, 1, 1, tzinfo=ZoneInfo("UTC"))
d5 = datetime(2021, 1, 1, tzinfo=FixedOffsetTimezone(offset=0))
d6 = datetime(2021, 1, 1, tzinfo=UTC)
assert d1 == d2 == d3 == d4 == d5 == d6
assert d1.tzname() == d2.tzname() == d3.tzname() == d4.tzname() == d6.tzname()
assert d5.tzname() == "+00" # psycopg2 uses Python 2.3 implementation ;-)
assert d1.utcoffset() == d2.utcoffset()
assert d1.utcoffset() == d3.utcoffset()
assert d1.utcoffset() == d4.utcoffset()
assert d1.utcoffset() == d5.utcoffset()
assert d1.utcoffset() == d6.utcoffset()
dl1 = datetime(2021, 6, 1, tzinfo=Timezone("Europe/London"))
dl2 = datetime(2021, 6, 1, tzinfo=ZoneInfo("Europe/London"))
dl3 = datetime(2021, 6, 1, tzinfo=timezone(timedelta(hours=1)))
assert dl1 == dl2 == dl3
dl1_from_name = datetime(2021, 6, 1, tzinfo=ptimezone(dl1.tzinfo.name))
dl1_from_offset = datetime(2021, 6, 1, tzinfo=ptimezone(int(dl1.utcoffset().total_seconds())))
assert dl1 == dl1_from_name == dl1_from_offset
dl2_from_name = datetime(2021, 6, 1, tzinfo=ptimezone(dl2.tzinfo.key))
dl2_from_offset = datetime(2021, 6, 1, tzinfo=ptimezone(int(dl2.utcoffset().total_seconds())))
assert dl2 == dl2_from_name == dl2_from_offset
dl3_from_offset = datetime(2021, 6, 1, tzinfo=ptimezone(int(dl3.utcoffset().total_seconds())))
assert dl3 == dl3_from_offsetMaybe we should try to serialize In case of serialize just timezone I think it is fine to serialize only Pendulum one, and when we have min Python 3.9 we could discuss should we also serialize |
b4b1ee0 to
1bcb521
Compare
| s["__data__"]["tz"] = "EDT" | ||
| d = deserialize(s) | ||
| assert d.timestamp() == 1657505443.0 | ||
| assert d.tzinfo.name == "-04:00" | ||
| # assert that it's serializable with the new format | ||
| assert deserialize(serialize(d)) == d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test ensures that the current version of the datetime Serializer fixes the bug by deserialize the US unsupported timezones, and that the deserialized values are serializable with the new format (if the user read an xcom serialized in version 1 and return it or send it to a new XCom)
|
@Taragolis for your last comment; based on this:
I wonder if we should support these classes or not, and if yes, should we include it in this bug fix or add it in a separate PR as a new feature. |
|
I don’t think we should support all this classes:
Better what we could do it serialize offset in seconds and after it could be deserialized as Pendulum Timezone. |
|
But we only could retrieve offset from datetime object. |
|
And in general I think we should fix current issue only. And after we could discuss should we make further changes or not. |
bolkedebruin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems pretty good already! Some questions / nits.
|
If you can adjust the nit, I'm okay with merging @hussein-awala |
I removed |
Change naming to properly reflect the mapping
|
The DynamoDB to S3 system test has been failing with an "invalid timezone" exception since this got merged. I'm looking into it. |
|
I think that happen because |
|
Looks like maybe we just need to tweak how
|
tzname() does not return full timezones and returned short hand notations are not deterministic. This changes the serialization to be deterministic and adds some logic to deal with serialized short-hand US Timezones and CEST. --------- Co-authored-by: bolkedebruin <bolkedebruin@users.noreply.github.com> (cherry picked from commit a3c06c0)
closes: #34483