Member since
12-10-2015
10
Posts
3
Kudos Received
0
Solutions
03-17-2018
03:31 PM
@Sooraj Antony As you have scenarios for skew data in the joining column, enable skew join optimization. set hive.optimize.skewjoin=true set hive.skewjoin.key=5000 you can tune it further with number of mapper tasks and split size by hive.skewjoin.mapjoin.map.tasks and hive.skewjoin.mapjoin.min.split properties.
... View more
08-28-2016
02:22 PM
1 Kudo
That's the interesting part - it actually wasn't working correctly. It was just hard to see the issue without doing the corresponding from_unixtime call. When you were doing the UNIX_TIMESTAMP call it was using '00' as the month, since the format string was using 'minutes' instead of 'months'. For whatever reason, the UNIX_TIMESTAMP function returns a timestamp value when the format string causes you to pass in invalid data like Month = '00'. In my opinion it should fail instead of returning invalid data like that. Check out this query - the first 2 columns use the correct data format string, while the second 2 columns use the invalid date format string (minutes instead of months): select
from_unixtime(1440201632, 'yyyy-MM-dd HH:mm:ss') as `good_date1`,
from_unixtime(1421884832, 'yyyy-MM-dd HH:mm:ss') as `good_date2`,
from_unixtime(1440201632, 'yyyy-mm-dd HH:mm:ss') as `bad_date1`,
from_unixtime(1421884832, 'yyyy-mm-dd HH:mm:ss') as `bad_date2`
from sample_07 limit 1; and results: good_date1 good_date2 bad_date1 bad_date2
2015-08-22 00:00:32 2015-01-22 00:00:32 2015-00-22 00:00:32 2015-00-22 00:00:32 Notice that the bad dates have zeros in their month field instead of 8 and 1 respectively. Hope this helps.
... View more
02-03-2016
02:41 AM
@Sooraj Antony has this been resolved? Can you post your solution or accept the best answer?
... View more