Discussion:
[tz] tzfiles contain Unix epoch for the first transition time
Eric Erhardt
2015-08-13 16:21:47 UTC
Permalink
I am working on enabling the .NET TimeZoneInfo class to read time zone information from tzfiles.

I've hit a snag with the latest tzdata 2015f. (I'm not sure when this change started, but the problem doesn't occur with the tzfiles that are shipped with an Ubuntu 14.04 distribution.)
The problem is that the 2015f version of the tzdata contains an initial "Transition Time" that is out of order. The beginning of the America/Chicago tzfile looks like the following:
Transition Time

Transition Offset

01/01/1970 00:00:00

-05:50:36

11/18/1883 18:00:00

-06:00:00

03/31/1918 08:00:00

-05:00:00 DST

10/27/1918 07:00:00

-06:00:00

Notice the first entry is for 1970, and then the next entry is for 1883. This breaks the documentation in 'man tzfile':
The above header is followed by tzh_timecnt four-byte values of type long, sorted in ascending order. These values are written in "standard" byte order. Each is used as a transition time (as returned by time(2)) at which the rules for computing local time change.
This causes the TimeZoneInfo parsing code to throw an exception because it is assuming these transitions are sorted in ascending order.
Is this an intentional change in the tzfiles? If so, will the tzfile man page be updated for this change?
Eric Erhardt
Jon Skeet
2015-08-14 08:38:08 UTC
Permalink
Just as an aid to verifying this, could you tell us which copy of the data
you're using (as each file contains two or three copies of the information).

A hex dump of the relevant section would be really handy, too.

Jon
Post by Eric Erhardt
I am working on enabling the .NET TimeZoneInfo class to read time zone
information from tzfiles.
I’ve hit a snag with the latest tzdata 2015f. (I’m not sure when this
change started, but the problem doesn’t occur with the tzfiles that are
shipped with an Ubuntu 14.04 distribution.)
The problem is that the 2015f version of the tzdata contains an initial
"Transition Time" that is out of order. The beginning of the
*Transition Time*
*Transition Offset*
01/01/1970 00:00:00
-05:50:36
11/18/1883 18:00:00
-06:00:00
03/31/1918 08:00:00
-05:00:00 DST
10/27/1918 07:00:00
-06:00:00
Notice the first entry is for 1970, and then the next entry is for 1883.
The above header is followed by tzh_timecnt four-byte values of type long, *sorted
in ascending order*. These values are written in "standard" byte order.
Each is used as a transition time (as returned by time(2)) at which the
rules for computing local time change.
This causes the TimeZoneInfo parsing code to throw an exception because it
is assuming these transitions are sorted in ascending order.
Is this an intentional change in the tzfiles? If so, will the tzfile man
page be updated for this change?
Eric Erhardt
Paul Eggert
2015-08-14 09:25:52 UTC
Permalink
Post by Eric Erhardt
I've hit a snag with the latest tzdata 2015f. (I'm not sure when this change started, but the problem doesn't occur with the tzfiles that are shipped with an Ubuntu 14.04 distribution.)
Transition Time
Transition Offset
01/01/1970 00:00:00
-05:50:36
11/18/1883 18:00:00
-06:00:00
That's not what I'm seeing. I assume you're talking about the 64-bit part of
the file, since the 1883 time stamp does not fit in 32 bits. The first
transition I see, at offset 1348 of the America/Chicago file, is for
-576460752303423488 (0xf800000000000000), which is the BIG_BANG time (see
zic.c). The second, at file offset 1356, is for -2717650800
(0xffffffff5e03f090), which is 1883-11-18 17:00:00 UTC. Neither of these
transition times agree with the times you're showing.
Post by Eric Erhardt
the problem doesn't occur with the tzfiles that are shipped with an Ubuntu 14.04 distribution.)
For what it's worth, the America/Chicago file that I generate by typing 'make
install' with the tz distribution is byte-for-byte identical to
/usr/share/zoneinfo/America/Chicago on my 64-bit Ubuntu 15.04 host.

If I had to guess, my guess is that your software is mishandling the BIG_BANG
time because the time stamp is so far in the past. Perhaps Ubuntu 14.04 didn't
do the Big Bang?
Paul Eggert
2015-08-14 09:37:34 UTC
Permalink
The second, at file offset 1356, is for -2717650800 (0xffffffff5e03f090), which
is 1883-11-18 17:00:00 UTC.
Sorry, I misinterpreted that one. The second one is actually for -2717647200
(0xffffffff5e03fea0), which is 1883-11-18 18:00:00 UTC, and this agrees with
your program. So the problem is only with the first transition; the second one
looks OK.
Robert Elz
2015-08-14 10:25:43 UTC
Permalink
Date: Thu, 13 Aug 2015 16:21:47 +0000
From: Eric Erhardt <***@microsoft.com>
Message-ID: <***@CY1PR0301MB1530.namprd03.prod.outlook.com>

| I've hit a snag with the latest tzdata 2015f.

Aside from what Jon Skeet asked, you should also indicate what you used
to generate the tz binary files (tzdata only has the source for the info,
not the binary versions you're obviously looking at - and quite properly
I think.) Was it the zic that is with the 2015f sources, or did you use
some other version, and if so what? What platform was that running on?

| This causes the TimeZoneInfo parsing code to throw an exception because it
| is assuming these transitions are sorted in ascending order.

That's reasonable, they should be.

| Is this an intentional change in the tzfiles?

No, what you're seeing is definitely a bug. The issue is how that
happened.

kre
Ian Abbott
2015-08-14 11:55:20 UTC
Permalink
Post by Robert Elz
Date: Thu, 13 Aug 2015 16:21:47 +0000
| I've hit a snag with the latest tzdata 2015f.
Aside from what Jon Skeet asked, you should also indicate what you used
to generate the tz binary files (tzdata only has the source for the info,
not the binary versions you're obviously looking at - and quite properly
I think.) Was it the zic that is with the 2015f sources, or did you use
some other version, and if so what? What platform was that running on?
For example, the Debian tzdata maintainer seems to be using an old
version of zic to generate their tzdata files, as they don't seem to
have the initial transitions in them (at least in their tzdata-2015f-1
packages for Debian stretch/sid).
Post by Robert Elz
| This causes the TimeZoneInfo parsing code to throw an exception because it
| is assuming these transitions are sorted in ascending order.
That's reasonable, they should be.
| Is this an intentional change in the tzfiles?
No, what you're seeing is definitely a bug. The issue is how that
happened.
If the initial transitions are also missing in Eric's tzfiles, perhaps
the bug is related to that.
--
-=( Ian Abbott @ MEV Ltd. E-mail: <***@mev.co.uk> )=-
-=( Web: http://www.mev.co.uk/ )=-
Robert Elz
2015-08-14 14:14:29 UTC
Permalink
Date: Fri, 14 Aug 2015 12:55:20 +0100
From: Ian Abbott <***@mev.co.uk>
Message-ID: <***@mev.co.uk>

| If the initial transitions are also missing in Eric's tzfiles, perhaps
| the bug is related to that.

Actually, given the values that Paul reported, I suspect the bug might be
in the code that's reading the files - the epoch result reported looks
like something is not using all 64 bits of the values - somehow dropping
the top 8 (or more) bits, which would make the big bang timestamp look like 0.

It must be using more than 32 bits though, otherwise it couldn't get the
1883 value - but it only needs to be using 33 bits for that one to work.

Perhaps some range check is happening, to keep the years within 32 bit
signed numbers?

Or maybe the top 32 bits are being used only to set the sign for an
unsigned bottom 32 bit value - that would produce the results indicated.
(for the big bang value, the sign would be negative, but the value is 0,
so it wouldn't affect anything.)

kre
Robert Elz
2015-08-14 14:44:28 UTC
Permalink
Date: Fri, 14 Aug 2015 12:55:20 +0100
From: Ian Abbott <***@mev.co.uk>
Message-ID: <***@mev.co.uk>

| If the initial transitions are also missing in Eric's tzfiles, perhaps
| the bug is related to that.

After I sent the last message, I wondered if perhaps the system Eric
used represented times in (fixed point) Java type notation, in milliseconds
since the epoch, rather than seconds - but even with that form, the big
bang timestamp doesn't overflow 64 bits (though it gets very close, just
2 non-zero bits remain, the sign, and the most significant bit). Of
course so many meaningful bits are lost the value would be nonsense, but not 0.

Even moving the epoch back to 1900 (which some systems do) doesn't affect
anything (if my back of the envelope calculation is right, the epoch would
need to move back over 9 billion years - 2/3 of the way to the big bang,
for a millisecond counter to produce 0 for the unix style big bang timestamp).
[Do not rely upon, nor quote off this list, that value - I did not verify.]

Of course, if the internal representation is microseconds (or any more
precise unit) since the epoch (1970 or anything else plausible) then
the big bang would (if overflow protection isn't perfect) turn into 0.

In any of those representations, very recent times, like 1883, fit in
64 bits just fine.

kre
Paul Eggert
2015-08-14 16:26:59 UTC
Permalink
Post by Robert Elz
After I sent the last message, I wondered if perhaps the system Eric
used represented times in (fixed point) Java type notation, in milliseconds
since the epoch, rather than seconds - but even with that form, the big
bang timestamp doesn't overflow 64 bits
Microsoft file times are unsigned 64-bit quantities that count the number of
100ns intervals since 1601-01-01 00:00:00 universal time. If his system is
using this format, that'd explain why it overflows for the Big Bang -- though it
wouldn't explain why the result was dated 1970.
Robert Elz
2015-08-14 18:44:28 UTC
Permalink
Date: Fri, 14 Aug 2015 09:26:59 -0700
From: Paul Eggert <***@cs.ucla.edu>
Message-ID: <***@cs.ucla.edu>

| Microsoft file times are unsigned 64-bit quantities that count the number
| of 100ns intervals

that's 0.1 us. Amazing.

| since 1601-01-01 00:00:00 universal time.

That's "recent enough" that it is probably not material.

| If his system is using this format, that'd explain why it overflows
| for the Big Bang

Yes.

| though it wouldn't explain why the result was dated 1970.

It could, depending upon how the conversions is done - we know that in
tzdata files, 0 == 1970-01-01

So, take the big bang 0xF800000000000000 multiply by 10*1000*1000
the result is 0x93D1CC0000000000000000 truncate that to 64 bits,
(leaving 0), then add the constant conversion factor to adjust
the unix epoch based time to the windows one.

Then print that, you get 1970, just as you would have if you'd started
with a true 1970-01-01 timestamp (ie: 0).

This seems very likely to be the problem. The bug is whatever is doing
the conversion isn't range checking the input - if the unix time_t value
is smaller (or bigger) than their format can represent, they should be
either generating an error, or limiting it to the earliest (or latest) times
that the format can represent.

kre
Robert Elz
2015-08-15 01:36:28 UTC
Permalink
Date: Fri, 14 Aug 2015 19:16:29 +0000
From: Eric Erhardt <***@microsoft.com>
Message-ID: <***@CY1PR0301MB1530.namprd03.prod.outlook.com>


| the "big bang" transition that didn't appear in the older tz files

No, it is relatively new.

| This time value isn't possible to represent in .NET

No, it is way too far back in time for that representation.

But ...
| (since DateTime.MinValue is 00:00:00.0000000 UTC, January 1, 0001,

That's not really the reason - again, a very rough calculation (and assuming
I did it correctly) means that the format you described should be able to
represent +/- (almost) 30,000 years from the epoch - that's something more
that 27000 BC.

The only reason I can see for picking that particular minimum value is
that it means avoiding the question of what year came before year 1
(some say it was 1 BC, and there was no year 0, others disagree - there
is of course no correct answer, as back then years weren't counted this
way, and even if they had been, no-one then would have considered the year
we now call year 1 as being in any way significant enough to warrant starting
counting from then.)

It also means avoiding the question of how to represent negative years.
I just tried it on my NetBSD system, and managed to get ...

Sat May 19 01:22:04 LMT -7537

(that was from "date -r -300000000000" - the -r option on NetBSD allows
providing the time_t value to use, rather than getting it from the clock,
a linux-like -d also exists, but the formats for that are just too weird).

I have no idea if that -7537 is 7537 BC or 7538 BC (ie: whether it is
assumed that there was a year 0 or not). I suspect that this all happens
just by accident, and no-one really ever considered the possibility of
negative years - it is only since time_t's became 64 bits (the last few
years) that it even became possible, before then the range was about
1901..2038)

Simply claiming that years before year 1 don't exist avoids both
problems, so it is kind of an elegant solution.

| in the Gregorian calendar).

We all do it, but of course, there was no Gregorian calendar then, Pope
Gregory didn't exist yet, nor did his great-great-great grandparents.
Nor were there even any popes, the job hadn't been invented yet...

| 1. I shouldn't be checking explicitly for this value (0xf800000000000000),
| right? I saw some code comments in zic.c that says it could potentially
| change in the future.

Just check for values too small (or large) to represent in the format
you're using. That one is a LONG way out of that range.

| 2. Will there ever be more than one transition time that is before
| January 1, 0001? Or will the "big bang" transition be the only one?

It is kind of unlikely - it's hard getting people to actually include
transitions before 1970 .. but back then there was no standard time
(no railways, planes, or computer networks that need consistent timekeeping)
so it is hard to imagine a reason for anything before about the 16th
century ever being meaningful enough to include.

kre
Paul Eggert
2015-08-15 03:16:52 UTC
Permalink
Post by Robert Elz
a very rough calculation (and assuming
I did it correctly) means that the format you described should be able to
represent +/- (almost) 30,000 years from the epoch - that's something more
that 27000 BC.
I think MS-Windows DateTime is unsigned internally, so it can't represent any
times before 0001-01-01 00:00:00 UTC. It's a bit confusing, as MS-Windows has
several time types each with their own epoch and tick size and range.
Post by Robert Elz
I have no idea if that -7537 is 7537 BC or 7538 BC (ie: whether it is
assumed that there was a year 0 or not). I suspect that this all happens
just by accident,
No accident. NetBSD assumes year 0. tzcode is the same, as is GNU/Linux and
Solaris. There is also year -1, etc. For example, the tzcode 'date' command
does this:

$ date -u -r -62135596800
Mon Jan 1 00:00:00 GMT 0001
$ date -u -r -62135596801
Sun Dec 31 23:59:59 GMT 0000
$ date -u -r -62167219200
Sat Jan 1 00:00:00 GMT 0000
$ date -u -r -62167219201
Fri Dec 31 23:59:59 GMT -001
$ date -u -r -67767978442512096
Tue Jan 1 02:38:24 GMT -2147479778

GNU/Linux 'date' is similar except it says 'UTC' rather than 'GMT' (of course
neither abbreviation is correct for these old time stamps).
Steve Allen
2015-08-15 04:13:10 UTC
Permalink
Post by Paul Eggert
$ date -u -r -67767978442512096
Tue Jan 1 02:38:24 GMT -2147479778
GNU/Linux 'date' is similar except it says 'UTC' rather than 'GMT'
(of course neither abbreviation is correct for these old time
stamps).
Of course.

Williams studied tidal rhythmites and found a nearly constant number
of about 410 solar days per year from 2 billion to 1 billion years
before present, which is about 77000 SI seconds in one day.
http://onlinelibrary.wiley.com/doi/10.1029/1999RG900016/abstract

That doesn't work with a calendar that supposes 365.25 days per year.
Any date before human record keeping should decide whether it is
counting seconds, days, or years (and which kind of each) because
using the modern relationships does not correspond to anything.

--
Steve Allen <***@ucolick.org> WGS-84 (GPS)
UCO/Lick Observatory--ISB Natural Sciences II, Room 165 Lat +36.99855
1156 High Street Voice: +1 831 459 3046 Lng -122.06015
Santa Cruz, CA 95064 http://www.ucolick.org/~sla/ Hgt +250 m
Paul Eggert
2015-08-15 17:17:50 UTC
Permalink
Post by Steve Allen
Williams studied tidal rhythmites and found a nearly constant number
of about 410 solar days per year from 2 billion to 1 billion years
before present, which is about 77000 SI seconds in one day.
http://onlinelibrary.wiley.com/doi/10.1029/1999RG900016/abstract
Thanks for mentioning that; I wasn't aware of this work. It appears, though,
that there's still considerable uncertainty about how long the day was way back
when. A recent review says that although tidal rhythmite analysis may help
estimate ancient lunar orbital periods in terms of lunar days/month, estimating
the length of the ancient Earth day remains uncertain because we don't know the
length of the ancient lunar sidereal month.

This is in contrast to something else I think you mentioned a while ago, namely
the length of the day going back to about 750 BC, for which Richard Stephenson
and coworkers have amassed historical eclipse records showing that our UTC-based
clocks would be off by about three hours if we naively took them back to the
year 0. See, for example, Sauter et al's reconstruction of the total solar
eclipse of 0319-05-06 which legend says converted Mirian III of Iberia to
Christianity.

Longhitano SG, Mellere D, Steel RJ, Ainsworth RB. Tidal depositional systems in
the rock record: A review and new insights. Sedimentary Geology 279, 2-22
(2012-11-20). http://dx.doi.org/10.1016/j.sedgeo.2012.03.024

Morrison L. The length of the day: Richard Stephenson’s contribution.
Astrophysics and Space Science Proceedings 43 (2015) 3-10.
http://dx.doi.org/10.1007/978-3-319-07614-0_1

Sauter J, Simonia I, Stephenson FR, Orchiston W. The legendary fourth-century
total solar eclipse in Georgia: Fact or fantasy? Astrophysics and Space Science
Proceedings Volume 43 (2015) 25-45. http://dx.doi.org/10.1007/978-3-319-07614-0_3
r***@fastmail.us
2015-08-15 22:14:59 UTC
Permalink
Post by Robert Elz
I have no idea if that -7537 is 7537 BC or 7538 BC (ie: whether it is
assumed that there was a year 0 or not).
It is 7538 BC. The same value works on OSX, and -62150000000 gives the
year 0000. This is specified by ISO 8601, but neither the C standard nor
(as far as I know) POSIX provides any guidance other than saying the
meaning of years less than 1 is unspecified.

I do notice that if I attempt to enter the actual big bang time, I get
the error message "date: localtime: Value too large to be stored in data
type" - the actual limit being run into seems to be a 32-bit value for
tm_year (the largest negative value it can represent is a year of
2147481748: -2**31 + 1900). Interestingly, strftime apparently has no
trouble formatting a year of 2147485547 (2**31+1899) despite that being
beyond the 32-bit limit.
Post by Robert Elz
I suspect that this all happens
just by accident, and no-one really ever considered the possibility of
negative years - it is only since time_t's became 64 bits (the last few
years) that it even became possible, before then the range was about
1901..2038)
Simply claiming that years before year 1 don't exist avoids both
problems, so it is kind of an elegant solution.
Artificially limiting the year range also allows you to use a fixed-size
broken-down time format (python datetime uses a 16-bit year) or a
floating-point format (MS Excel and therefore COM use a floating-point
format measured in days) - both limit the year to 1 through 9999 and not
having to contend with different formats having different real limits.
Paul Eggert
2015-08-15 23:58:38 UTC
Permalink
Post by r***@fastmail.us
strftime apparently has no
trouble formatting a year of 2147485547 (2**31+1899) despite that being
beyond the 32-bit limit.
tzcode strftime has special code to format years correctly even if tm_year +
1900 exceeds INT_MAX. See the _yconv code involving DIVISOR in strftime.c. As
I understand it POSIX requires this sort of thing. That is, although POSIX
doesn't require support for UTC years before 1970, POSIX does require that
localtime and strftime support UTC years through INT_MAX + 1900 if time_t is
wide enough (e.g., the common case of 64-bit time_t and 32-bit int).
Paul Eggert
2015-08-15 02:10:15 UTC
Permalink
2. Will there ever be more than one transition time that is before January 1, 0001? Or will the "big bang" transition be the only one?
It's unlikely that we'll see more than one transition before that cutoff in the
published data. That being said, the binary file format does allow more than
one such transition and it should be easy enough to ignore all but the last one.
We were toying with the idea having zic put one transition at -2**63 and a
later transition at the Big Bang, for example, and I'd rather not rule that out
in future versions.
Lester Caine
2015-08-14 14:18:29 UTC
Permalink
Post by Eric Erhardt
Notice the first entry is for 1970, and then the next entry is for 1883.
The first entry was probably a 'null' timestamp? And the software has to
display a valid date for which the current 'default' is used. I see this
sort of problem often when looking at genealogical data where '0' has
been used as an unknown date. The results depend on just which date
processing software is being used ...
--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Loading...