Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Formats for unix_timestamp() function in Impala

avatar
Contributor

Good afternoon,

 

We are making use of the unix_timestamp() function to take different dates in string format and getting them to a timestamp.  Unfortunately the date format we are getting in via CSV files is coming in as something like 1/2/2007 where the month and date are not padded with a ‘0’ for single digit days or months.

 

For Hive, it looks like not having the padding works fine.

 

#hive

select unix_timestamp('01/02/2007', 'MM/dd/yyyy');  // returns 1167696000

 

select unix_timestamp('1/2/2007', 'MM/dd/yyyy');  // returns 1167696000

 

But in Impala if the format is not padded with ‘0’ then we get null. 

 

#impala

select unix_timestamp('01/02/2007', 'MM/dd/yyyy');  // returns 1167696000

 

select unix_timestamp('1/2/2007', 'MM/dd/yyyy');  // returns null

 

And if we were to change the format to just single ‘d’ or ‘M’ it works for the non-padded sources, but obviously returns null for the two digit formats.

 

#impala

select unix_timestamp('1/2/2007', 'M/d/yyyy');  // returns 1167696000

 

select unix_timestamp('10/20/2007', 'M/d/yyyy');  // returns null

 

One work around we do in other areas is basically ingest the data and then use Hive to transform the data into a new table.  However we have a use case currently where we would like to put Impala directly on top of the CSV file and not do a transformation.

 

We think what we’ll end up doing is see if the creater of the CSV can create the dates with padded ‘0’s, or just treat them as strings.    Before going down this route, we wanted to see if the observations we are noticing in Impala is expected, or if we’re missing anything.

 

Thanks in advance,

 

Mac

 

5 REPLIES 5

avatar
Rising Star

See https://issues.cloudera.org/browse/IMPALA-2732. In a recent build:

 

> select unix_timestamp('10/20/2007', 'M/d/yyyy');
Query: select unix_timestamp('10/20/2007', 'M/d/yyyy')
+------------------------------------------+
| unix_timestamp('10/20/2007', 'm/d/yyyy') |
+------------------------------------------+
| 1192838400                               |
+------------------------------------------+

avatar
Contributor

Thanks.  Indeed that is exactly what we are looking for.

 

I realize this is maybe something you don't know, but any idea if Impala 2.5.0 might be put in a 5.5.* release, or if this might be back ported to a 2.3 release of Impala with CDH 5.5.*?

 

We're at 5.4.7 right now and I've been looking for opportunities to the team to buy off on 5.5. so just checking to see if this might be an arrow in my life-cycle management quiver.

 

If you don't know, no worries.

avatar
Rising Star
I do not know, but I have sent the person who will a link to this forum question and asked her to tell you if she knows.

avatar
Cloudera Employee

We plan to backport this one to impala 2.3, likely the cdh 5.5.4.

avatar
Contributor
Thanks!