Lots of fun with Postgres and Python timezone shenanigans!
There are few things developers love more than having to handle timezones. One of them is having to handle timezones in different environments!
Lately I had to deal with some timezone operations across Python and Postgres and decided to document here the shenanigans and quirks of the two systems and how I try to avoid them.
TIMESTAMP WITH TIME ZONE does NOT store a timezone
This is something I knew already but it irks me every time I remember it exists.
In Postgres there are essentially two data types to represent timestamps:
TIMESTAMP WITH TIME ZONE (also shortened as
TIMESTAMPTZ, and if you want to be explicit the former can be written as
TIMESTAMP WITHOUT TIME ZONE).
They both store a timestamp with microsecond resolution, and from the name you would assume that one of them also keeps track of the original timezone. That’s what “with time zone” means right? Wrong.
In reality, both formats use the same 8-byte representation and no timezone data is stored. The difference is that when it’s time to represent them or perform timezone conversions Postgres assumes that
TIMESTAMPTZ is in UTC.
This is explained in the Postgres documentation on datetime, to be fair, but still I find the name misleading (maybe
TIMESTAMP IN UTC could make more sense? As an alias, at least) because if you store a timestamp providing a non-UTC timezone (e.g.
2023‐08‐07T16:43:18+07:00) it will be kindly converted to UTC and after that the information about the original timezone will be lost.
In general, it makes sense to use
TIMESTAMPTZ rather than
TIMESTAMP whenever possible because it’s explicit on how the timezone has to be handled and as you can see there’s no overhead in terms of storage.
Python pytz should NOT be used, and is an amazing footgun
tl;dr avoid pytz, use zoneinfo from the standard library instead
Using Python, you can juggle with timezones inside the application rather than in the database.
For that, you may find that there’s a library called
pytz. Even now, when I google for common tasks regarding timezones in Python the Stackoverflow replies suggest to use it.
Some time ago I found an extremely weird behavior of this library. Look at this snippet:
from datetime import datetime import pytz epoch = 1691397357 dt = datetime.utcfromtimestamp(epoch) print(dt.isoformat()) rome_dt = dt.replace(tzinfo=pytz.timezone('Europe/Rome')) print(rome_dt.isoformat()) print(rome_dt.astimezone(pytz.UTC).isoformat())
My intention with this code was to parse an UNIX epoch timestamp from a legacy system that is set to a non-UTC timezone.
Imagine my surprise when I got this output:
2023-08-07T08:35:57 2023-08-07T08:35:57+00:50 2023-08-07T07:45:57+00:00
yes, you are reading it correctly: it added an offset of
50 minutes. Rome has actually an offset from UTC of 1 or 2 hours (depending on DST) and at the moment I’m not aware of any timezone in the world with an offset that is not a multiple of 15 minutes (usually they are multiple of 1 hours, but for example Nepal has an offset of
+5:45 right now).
This is extremely bizarre, and it happens with other timezones too, adding an unusual amount of minutes.
There’s an explanation for this behavior, in short pytz tries to eagerly calculate a timezone offset instead of calculating it based on the actual timestamp, but this offset changes over time. That is, given a timezone like
Europe/Rome the timezone is a function that takes a naive timestamp as an input and produces an offset. Pytz however calculates this offset once and reuses it.
This behavior is due to compatibility reasons, old versions of Python (before 3.6, abundantly EOL at the time of writing this in 2023) didn’t provide a mechanism to interpret ambiguous timestamps for the time in which the clock is moved back an hour once a year due to the DST.
This variation keeps the change minimal:
from datetime import datetime, timezone from zoneinfo import ZoneInfo epoch = 1691397357 dt = datetime.utcfromtimestamp(epoch) print(dt.isoformat()) rome_dt = dt.replace(tzinfo=ZoneInfo('Europe/Rome')) print(rome_dt.isoformat()) print(rome_dt.astimezone(timezone.utc).isoformat())
and produces the intended result
2023-08-07T08:35:57 2023-08-07T08:35:57+02:00 2023-08-07T06:35:57+00:00
Where does Postgres get its timezones?
Another issue that I encountered handling timezones was the sudden appearance of new timezone, or changes to existing ones. I assume many applications do not really struggle with this, but I work for a bus company and bus stops are everywhere, and for my machine learning tasks I often need to know the local time as seen by the customer at a given stop.
Each stop has a timezone name in a standardized format that is the one used in the tz database and understood by most software including Postgres and the Python zoneinfo module.
Governments from all over the world enjoy changing the timezones on their territory, introducing or removing DST, merging and splitting them or skipping a whole day. Hopefully these changes are planned in advance.
A relatively recent example are the timezones
Mexico/Ciudad_Juarez. These timezones have been added in 2022 (specifically, in the database release
Turns out, my Postgres 14.5 instance from Amazon RDS does not recognize these new timezones. Similarly, a dockerized version running locally fails to detect them.
I reported the problem to the docker library, and after some experiments I found that Postgres incorporates a specific version of the TZ database, updated with the database releases. I like this approach because it means timezones will be the same across postgres 14.6 regardless of the operating system or provider (RDS, Digitalocean, Neon, a local deployment). However it has to be activated at compiled time, and the official Docker image does not.
The official docker images is based on Debian, and uses Debian tzdata package. To my surprise, Debian does not seem to update this package over time, so Debian 11 (bullseye) remains with 2021a-1+deb11u8. Debian 12 (bookworm) ships with a newer version of tzdata but it’s likely that this problem will occur later as new timezone names are introduced.
The Alpine flavor of the Postgres image is built with the incorporated tzdata mentioned above making the timezone conversion more predictable as it depends only on the Postgres version.
I could not find where Amazon RDS gets its tzdata, but I presume it uses the Postgres one because upgrading the engine to Postgres 14.6 fixed the issue.
What about Python timezones data?
The zoneinfo module in the standard library of Python does not incorporate any tz database but rather relies on the system one. If that’s missing then it tries to use the tzdata package.