Created on 10-20-2015 01:51 PM - edited 09-16-2022 02:44 AM
CDH 5.3.3
Impala 2.1.3-cdh5
Working on converting a large MySQL table into an Impala table formatted as parquet. The datetime field was causing issues so we converted it to unix time using Java's .getTime() which gave us '1397166481000' for '2014-04-10 21:48:01'.
Table created and everything appears good to go, but when we try to query dates things get odd...
Field: time_stamp BIGINT
Now when we query this field using Impala we get different results:
from_unixtime(time_stamp,'yyyy-MM-dd HH:mm:ss.sss') --> 2011-04-06 17:10:00.000
CAST(time_stamp AS timestamp) --> 2011-04-06 17:10:00
CAST(CAST(strleft(CAST(time_stamp AS STRING),10) AS bigint) AS timestamp) --> 2014-04-10 21:48:01
The casting BIGINT as a STRING, then lopping off the last 3 zeros, casting back to BIGINT, then casting as timestamp seems a bit excessive but I haven't been able to find the solution.
Do I need to go back and fix things at the .getTime() level?
Created 10-21-2015 08:59 AM
Turns out that dividing by 1000 solves the issue:
CAST(tp.time_stamp/1000 AS timestamp)
which outputs the correct date and time. I still find it odd that datetime from mysql gets stored as TIMESTAMP_MILLIS --> int64 in parquet and bigint for the field in impala which can't be converted back to datetime without an extra step, but, things work now...
Created 10-20-2015 02:31 PM
Impala automatically converts STRING literals of the correct format into TIMESTAMP values. I belive you don`t need to convert to unix time
Timestamp values are accepted in the format YYYY-MM-DD HH:MM:SS.sssssssss, and can consist of just the date, or just the time, with or without the fractional second portion. For example, you can specify TIMESTAMP values such as' 2011-04-10', '21:48:01', or '2011-04-10 21:48:01.000' or '2011-04-10 21:48:01'
Created 10-21-2015 08:59 AM
Turns out that dividing by 1000 solves the issue:
CAST(tp.time_stamp/1000 AS timestamp)
which outputs the correct date and time. I still find it odd that datetime from mysql gets stored as TIMESTAMP_MILLIS --> int64 in parquet and bigint for the field in impala which can't be converted back to datetime without an extra step, but, things work now...