Created on 03-13-2015 08:06 AM - edited 09-16-2022 02:24 AM
Is this a known issue with Impala?
unix_timestamp is apparently not working properly with dates beyond the year 2038. In contrast, Hive appears to not have this problem.
[cloudera@quickstart ~]$ impala-shell
Starting Impala Shell without Kerberos authentication
Connected to quickstart.cloudera:21000
Server version: impalad version 2.1.0-cdh5 RELEASE (build e48c2b48c53ea9601b8f47a39373aa83ff7ca6e2)
Welcome to the Impala shell. Press TAB twice to see a list of available commands.
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
(Shell build version: Impala Shell v2.1.0-cdh5 (e48c2b4) built on Tue Dec 16 19:00:35 PST 2014)
[quickstart.cloudera:21000] > select to_date( from_unixtime( unix_timestamp( '2036-06-14', 'yyyy-MM-dd' ))) as a,
> to_date( from_unixtime( unix_timestamp( '2037-06-14', 'yyyy-MM-dd' ))) as b,
> to_date( from_unixtime( unix_timestamp( '2038-06-14', 'yyyy-MM-dd' ))) as c;
Query: select to_date( from_unixtime( unix_timestamp( '2036-06-14', 'yyyy-MM-dd' ))) as a,
to_date( from_unixtime( unix_timestamp( '2037-06-14', 'yyyy-MM-dd' ))) as b,
to_date( from_unixtime( unix_timestamp( '2038-06-14', 'yyyy-MM-dd' ))) as c
+------------+------------+------------+
| a | b | c |
+------------+------------+------------+
| 2036-06-14 | 2037-06-14 | 1902-05-08 |
+------------+------------+------------+
Fetched 1 row(s) in 0.41s
[quickstart.cloudera:21000] >
Here is the same query in Hive:
[cloudera@quickstart ~]$ hive
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
hive> select to_date( from_unixtime( unix_timestamp( '2036-06-14', 'yyyy-MM-dd' ))) as a,
> to_date( from_unixtime( unix_timestamp( '2037-06-14', 'yyyy-MM-dd' ))) as b,
> to_date( from_unixtime( unix_timestamp( '2038-06-14', 'yyyy-MM-dd' ))) as c;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1425573450783_0058, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1425573450783_0058/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1425573450783_0058
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-03-13 11:03:43,809 Stage-1 map = 0%, reduce = 0%
2015-03-13 11:03:55,265 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.71 sec
MapReduce Total cumulative CPU time: 1 seconds 710 msec
Ended Job = job_1425573450783_0058
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.71 sec HDFS Read: 284 HDFS Write: 33 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 710 msec
OK
2036-06-14 2037-06-14 2038-06-14
Time taken: 29.0 seconds, Fetched: 1 row(s)
Thanks
Morgan
Created 03-13-2015 11:11 AM
Yes, it is a known issue. https://issues.cloudera.org/browse/IMPALA-1579 tracks it, and a fix will be in the next Impala release in CDH5.4.
Best,
Henry
Created 03-13-2015 11:06 AM
Isn't that more of a general Unix time problem?
Created 03-13-2015 11:08 AM
Perhaps it is, but as I showed in the post above, Hive works correctly, but Impala fails.
Thanks
Morgan
Created 03-13-2015 11:11 AM
Yes, it is a known issue. https://issues.cloudera.org/browse/IMPALA-1579 tracks it, and a fix will be in the next Impala release in CDH5.4.
Best,
Henry
Created 03-13-2015 11:27 AM
Henry,
Thanks for the info.
Morgan