we are using sqoop to transfer tables with timestamp columns into Parquet files. Based on the Cloudera recommendation we are reading those external tables as bigint and dividing the timestamp value by 1000 and converting to timestamp.
The problem is that this conversion leads to unprecise timestamp values. Somehow Impala considers the result of the division as float data type and thus the conversion to Timestamp value results to a timestamp with a few nanoseconds difference.
Here is an example:
select now(), cast( now() as double), cast( cast(now() as double) as timestamp);
In the result you can see, that now() DOES NOT EQUAL to cast( cast( now() as double) as timestamp)
Casting the value to decimal do not help either:
select cast( cast( 1470312619.123 as decimal(20,3)) as timestamp);
The result should have 123 milliseconds, but have a little less.
Is there a way to reliable convert a BIGINT timestamp with milliseconds to a Impala TIMESTAMP data type without loss of precision?
I'm aware this is a very old question - but did you folks ever resolve this? When sqooping from an SQLServer instance to parquet files and then going throug the subsequent conversion steps, all of my times are off by about 5-6 hours.
For example, looking at two fields with the same ID in each data source:
SOURCE: | 56723157200011 | 2016-02-11 00:00:00 |
IMPALA: | 56723157200011 | 2016-02-11 06:00:00 |
Conversion process was (BIGINT/1000 AS TIMESTAMP).
@mtrepanierthe 6 hour difference seems like a timezone conversion issue. My guess is that there was a missing or an undesired timezone conversion in there somewhere:
By default casting an integer to a timestamp will give you a timestamp in UTC time.
@Tomas79sorry you didn't get a response to this question. It looks like the root cause of your problem could have been something like https://issues.apache.org/jira/browse/IMPALA-3437 . If you cast one of the input arguments to a decimal data type then the output will be decimal and you will get a precise response.