Description
I'm trying to understand how do deal properly with timestamps. I've created a CSV file with some crucial timestamps (at least I believe these are):
2019-01-01 00:00:00.0000 2015-01-01 00:00:00.0001 2015-01-01 00:00:00.0000 2014-12-31 23:59:59.9999 1970-01-01 00:00:00.0001 1970-01-01 00:00:00.0000 1969-12-31 23:59:59.9999 1969-12-31 23:59:59.0001 1969-12-31 23:59:59.0000 1969-12-31 23:59:58.9999
I've created an ORC file using hive-1.1.0-cdh5.14.2. Hive is able to read this file back correctly. All timestamps seem to match. Reading the same file using orc-tools shows different results:
{{ {"_col0":"2019-01-01 00:00:00.0"} }} {{ {"_col0":"2015-01-01 00:00:00.0001"} }} {{ {"_col0":"2015-01-01 00:00:00.0"} }} {{ {"_col0":"2014-12-31 23:59:59.9999"} }} {{ {"_col0":"1970-01-01 00:00:00.0001"} }} {{ {"_col0":"1970-01-01 00:00:00.0"} }} {{ {"_col0":"1969-12-31 23:59:58.9999"} }} {{ {"_col0":"1969-12-31 23:59:59.0001"} }} {{ {"_col0":"1969-12-31 23:59:59.0"} }} {{ {"_col0":"1969-12-31 23:59:57.9999"} }}
The actual result/difference here being the last and 4th from last row, which are one second off.
With some modifications I managed to have orc-tools generate a file itself with timestamps using convert (see ORC-526), which, when I read this one back in hive-1.1.0-cdh5.14.2 results in:
2019-01-01 00:00:00 2015-01-01 00:00:00.0001 2015-01-01 00:00:00 2014-12-31 23:59:59.9999 1970-01-01 00:00:00.0001 1970-01-01 00:00:00 1970-01-01 00:00:00.9999 1969-12-31 23:59:59.0001 1969-12-31 23:59:59 1969-12-31 23:59:59.9999
which is also wrong: 4th row from bottom and on the last row by one second, but this time in the other direction. When I read the file with orc-tools itself, it shows correct output (58) for the last row, but incorrect ouput for the 4th from bottom. I noticed orc-tools-1.2.0 cannot read the file from 1.6.0. 1.3.4 can, which also results in the incorrect output.
orc-tools-1.6.0:
{{ {"mytime":"2019-01-01 00:00:00.0"} }} {{ {"mytime":"2015-01-01 00:00:00.0001"} }} {{ {"mytime":"2015-01-01 00:00:00.0"} }} {{ {"mytime":"2014-12-31 23:59:59.9999"} }} {{ {"mytime":"1970-01-01 00:00:00.0001"} }} {{ {"mytime":"1970-01-01 00:00:00.0"} }} {{ {"mytime":"1970-01-01 00:00:00.9999"} }} {{ {"mytime":"1969-12-31 23:59:59.0001"} }} {{ {"mytime":"1969-12-31 23:59:59.0"} }} {{ {"mytime":"1969-12-31 23:59:58.9999"} }}
orc-tools-1.3.4:
{{ {"mytime":"2019-01-01 00:00:00.0"} }} {{ {"mytime":"2015-01-01 00:00:00.0001"} }} {{ {"mytime":"2015-01-01 00:00:00.0"} }} {{ {"mytime":"2014-12-31 23:59:59.9999"} }} {{ {"mytime":"1970-01-01 00:00:00.0001"} }} {{ {"mytime":"1970-01-01 00:00:00.0"} }} {{ {"mytime":"1970-01-01 00:00:00.9999"} }} {{ {"mytime":"1969-12-31 23:59:58.0001"} }} {{ {"mytime":"1969-12-31 23:59:59.0"} }} {{ {"mytime":"1969-12-31 23:59:58.9999"} }}
I'm getting a bit lost at what's right and wrong, but I'm getting the feeling something doesn't add up here.
Attachments
Issue Links
- links to