The following is a write-up I did while working on the ScoutRed ETL pipeline and attempting to find the best way to serialize and store dates when the data source comes primarily from paper-form based data where we don’t necessarily know (or want to know) the timezone or locality.
A date is defined here as an entity which includes year, month, and day, but does not include any information about the time or timezone.
The need for the data to be accessible internationally and across time zones spurred some interesting dilemmas. Should we add time and timezone information to the dates even though they didn’t originally have any? If so then what should the time be? How should we handle the dates being viewed by people in different parts of the world?
After some deliberation we realized we would need to answer the following three questions:
- How should we store dates in standard postgres columns?
- How should we format date strings stored in postgres json fields and/or api contracts?
- How should we handle dates on the browser/client?
Research Findings and Proposition
The ISO-8601 international date format seems to be the preferred method for serializing a date-only string across the web:
I read the ISO-8601 standard itself to say that a date formatted this way should be interpreted as being in the Local timezone and not UTC timezone. While the official spec costs money, here is what the wikipedia page has to say:
Time zones in ISO 8601 are represented as local time (with the location unspecified), as UTC, or as an offset from UTC. If no UTC relation information is given with a time representation, the time is assumed to be in local time.
Note: parsing of date strings with the Date constructor (and Date.parse, they are equivalent) is strongly discouraged due to browser differences and inconsistencies. Support for RFC 2822 format strings is by convention only. Support for ISO 8601 formats differs in that date-only strings (e.g. “1970–01–01”) are treated as UTC, not local.
Based on these findings my answers to the original questions are as follows.
So how should we store dates in standard postgres columns?
It all comes down to the source data. If the source data includes only a date and no time then we should use the postgres
date type which is date-only. If the source data includes date and time then we should use the
timestamp type which accounts for everything.
When we do store timestamps, whether we store them in UTC or their Local timezone doesn’t really matter as long as the timezone information is there and so they can be converted to whatever we need.
So how should we format date strings stored in postgres json fields and/or api contracts and handle them on the client?
All json contracts should use the ISO-8601 date-only format of
2018-12-04 for date-only type fields, and the ISO-8601 date-time-zone format of
2018-12-04T09:57:20Z for timestamp fields. When we use date-only fields we should do so with the understanding that different tools could potentially parse it in different ways, however we make our intention of how it should be used clear through documentation. For example:
If the need arises that the user needs to know what timezone we assume the date to live in based on the locality of the data provider, we could add a locality or timezone as a separate property and display that in the UI accordingly, not use it to modify the date. If the user really wants to see what that date would be in a different timezone then they can do the math themselves.
This is the same way DOB or date of birth is typically handled, the date remains absolute regardless of where in the world it is being read because it is often used as a meaningful personal identifier in conjunction with names. However in use cases of something like astrological readings where the timezone is relevant then the location of birth is included as a separate piece of information along with the date.
I believe with this solution we stay aligned 100% with the international ISO-8601 spec, we don’t need to re-invent this wheel, and we also get the flexibility we want when using date-only fields without loosing the ability to indicate the originating timezone or locality of a date-only field if we ever decide we need that.