Created on 02-03-2016 11:36 AM - edited 09-16-2022 03:01 AM
Good afternoon,
We are making use of the unix_timestamp() function to take different dates in string format and getting them to a timestamp. Unfortunately the date format we are getting in via CSV files is coming in as something like 1/2/2007 where the month and date are not padded with a ‘0’ for single digit days or months.
For Hive, it looks like not having the padding works fine.
#hive
select unix_timestamp('01/02/2007', 'MM/dd/yyyy'); // returns 1167696000
select unix_timestamp('1/2/2007', 'MM/dd/yyyy'); // returns 1167696000
But in Impala if the format is not padded with ‘0’ then we get null.
#impala
select unix_timestamp('01/02/2007', 'MM/dd/yyyy'); // returns 1167696000
select unix_timestamp('1/2/2007', 'MM/dd/yyyy'); // returns null
And if we were to change the format to just single ‘d’ or ‘M’ it works for the non-padded sources, but obviously returns null for the two digit formats.
#impala
select unix_timestamp('1/2/2007', 'M/d/yyyy'); // returns 1167696000
select unix_timestamp('10/20/2007', 'M/d/yyyy'); // returns null
One work around we do in other areas is basically ingest the data and then use Hive to transform the data into a new table. However we have a use case currently where we would like to put Impala directly on top of the CSV file and not do a transformation.
We think what we’ll end up doing is see if the creater of the CSV can create the dates with padded ‘0’s, or just treat them as strings. Before going down this route, we wanted to see if the observations we are noticing in Impala is expected, or if we’re missing anything.
Thanks in advance,
Mac
Created 02-03-2016 12:01 PM
See https://issues.cloudera.org/browse/IMPALA-2732. In a recent build:
> select unix_timestamp('10/20/2007', 'M/d/yyyy'); Query: select unix_timestamp('10/20/2007', 'M/d/yyyy') +------------------------------------------+ | unix_timestamp('10/20/2007', 'm/d/yyyy') | +------------------------------------------+ | 1192838400 | +------------------------------------------+
Created 02-03-2016 12:10 PM
Thanks. Indeed that is exactly what we are looking for.
I realize this is maybe something you don't know, but any idea if Impala 2.5.0 might be put in a 5.5.* release, or if this might be back ported to a 2.3 release of Impala with CDH 5.5.*?
We're at 5.4.7 right now and I've been looking for opportunities to the team to buy off on 5.5. so just checking to see if this might be an arrow in my life-cycle management quiver.
If you don't know, no worries.
Created 02-03-2016 12:34 PM
Created 02-12-2016 01:03 PM
We plan to backport this one to impala 2.3, likely the cdh 5.5.4.
Created 02-12-2016 01:05 PM