Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Formats for unix_timestamp() function in Impala

Formats for unix_timestamp() function in Impala

Good afternoon,

 

We are making use of the unix_timestamp() function to take different dates in string format and getting them to a timestamp.  Unfortunately the date format we are getting in via CSV files is coming in as something like 1/2/2007 where the month and date are not padded with a ‘0’ for single digit days or months.

 

For Hive, it looks like not having the padding works fine.

 

#hive

select unix_timestamp('01/02/2007', 'MM/dd/yyyy');  // returns 1167696000

 

select unix_timestamp('1/2/2007', 'MM/dd/yyyy');  // returns 1167696000

 

But in Impala if the format is not padded with ‘0’ then we get null. 

 

#impala

select unix_timestamp('01/02/2007', 'MM/dd/yyyy');  // returns 1167696000

 

select unix_timestamp('1/2/2007', 'MM/dd/yyyy');  // returns null

 

And if we were to change the format to just single ‘d’ or ‘M’ it works for the non-padded sources, but obviously returns null for the two digit formats.

 

#impala

select unix_timestamp('1/2/2007', 'M/d/yyyy');  // returns 1167696000

 

select unix_timestamp('10/20/2007', 'M/d/yyyy');  // returns null

 

One work around we do in other areas is basically ingest the data and then use Hive to transform the data into a new table.  However we have a use case currently where we would like to put Impala directly on top of the CSV file and not do a transformation.

 

We think what we’ll end up doing is see if the creater of the CSV can create the dates with padded ‘0’s, or just treat them as strings.    Before going down this route, we wanted to see if the observations we are noticing in Impala is expected, or if we’re missing anything.

 

Thanks in advance,

 

Mac

 

5 REPLIES 5
Highlighted

Re: Formats for unix_timestamp() function in Impala

Contributor

See https://issues.cloudera.org/browse/IMPALA-2732. In a recent build:

 

> select unix_timestamp('10/20/2007', 'M/d/yyyy');
Query: select unix_timestamp('10/20/2007', 'M/d/yyyy')
+------------------------------------------+
| unix_timestamp('10/20/2007', 'm/d/yyyy') |
+------------------------------------------+
| 1192838400                               |
+------------------------------------------+

Re: Formats for unix_timestamp() function in Impala

Thanks.  Indeed that is exactly what we are looking for.

 

I realize this is maybe something you don't know, but any idea if Impala 2.5.0 might be put in a 5.5.* release, or if this might be back ported to a 2.3 release of Impala with CDH 5.5.*?

 

We're at 5.4.7 right now and I've been looking for opportunities to the team to buy off on 5.5. so just checking to see if this might be an arrow in my life-cycle management quiver.

 

If you don't know, no worries.

Re: Formats for unix_timestamp() function in Impala

Contributor
I do not know, but I have sent the person who will a link to this forum question and asked her to tell you if she knows.

Re: Formats for unix_timestamp() function in Impala

Cloudera Employee

We plan to backport this one to impala 2.3, likely the cdh 5.5.4.

Re: Formats for unix_timestamp() function in Impala

Thanks!