Member since
02-27-2014
8
Posts
5
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6973 | 10-21-2015 08:59 AM | |
4967 | 11-14-2014 07:55 AM | |
2700 | 11-04-2014 09:35 AM |
10-21-2015
08:59 AM
Turns out that dividing by 1000 solves the issue: CAST(tp.time_stamp/1000 AS timestamp) which outputs the correct date and time. I still find it odd that datetime from mysql gets stored as TIMESTAMP_MILLIS --> int64 in parquet and bigint for the field in impala which can't be converted back to datetime without an extra step, but, things work now...
... View more
10-20-2015
01:51 PM
CDH 5.3.3 Impala 2.1.3-cdh5 Working on converting a large MySQL table into an Impala table formatted as parquet. The datetime field was causing issues so we converted it to unix time using Java's .getTime() which gave us '1397166481000' for '2014-04-10 21:48:01'. Table created and everything appears good to go, but when we try to query dates things get odd... Field: time_stamp BIGINT Now when we query this field using Impala we get different results: from_unixtime(time_stamp,'yyyy-MM-dd HH:mm:ss.sss') --> 2011-04-06 17:10:00.000 CAST(time_stamp AS timestamp) --> 2011-04-06 17:10:00 CAST(CAST(strleft(CAST(time_stamp AS STRING),10) AS bigint) AS timestamp) --> 2014-04-10 21:48:01 The casting BIGINT as a STRING, then lopping off the last 3 zeros, casting back to BIGINT, then casting as timestamp seems a bit excessive but I haven't been able to find the solution. Do I need to go back and fix things at the .getTime() level?
... View more
Labels:
11-14-2014
07:55 AM
Got a Hive query that took 3-4min on 5.1, 20min after upgrading to 5.2, down to 1min on 5.2 with some tweaks. On 4.6 we were running MR1. When we upgraded to 5.1 we kept MR1 but had YARN/MR2 deployed also. We tried running jobs with YARN/MR2 but they took forever so we stayed with MR1 which was running them quickly. When we upgraded to 5.2 the MR1 jobs slowed to a crawl. Uninstalling YARN/MR2 and running with just MR1 made no difference (apparently it isn't recommended to run with both...). Re-installed YARN/MR2, uninstalled MR1, and got a HUGE improvement in runtimes (back to where we were at with 5.1). Throughout this process we worked with a performance engineer at Cloudera (Boris FTW) who helped us make further adjustments and tweak out the system resulting in even better performance. Still have a couple adjustments to make which he recommended (cleanup/best practices), but we're back in good shape again and can't say enough about Cloudera's enterprise support team.
... View more
11-04-2014
09:54 AM
2 Kudos
Upgraded to 5.2 last week and have been noticing a SIGNIFICANT performance reduction when running the same queries pre and post-upgrade. For example, each morning a series of queries run (~200) and the day before the upgrade this entire process would take a little over 2h (and has taken that long for the preceeding 6 months). Currently running the same query takes +12h to complete and the only change being the upgrade (no sudden increase in data, node setup/usage, query changes, etc.). I've been looking into the performance improvment section (http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_admin_performance.html) but so far nothing has made a difference. I'm not seeing anything in memory, cpu, disk latency, garbage collection, failed tasks, or anything that really points to "this is why you're waiting so long" - also not sure if it is a hive, or mapreduce issue, or both, or something else entirely... If the time to produce an expected query result goes from 30s to 5min overnight after an upgrade - where would you start to look? P.s. both queries run in Hue/Beeswax and submitted directly to the cluster are experiencing the same issue.
... View more
Labels:
- Labels:
-
Apache Hive
-
Cloudera Hue
-
MapReduce
11-04-2014
09:35 AM
Was a bad metastore server - ended up stopping the metastore server and using a different one.
... View more
10-30-2014
07:33 AM
Upgraded to CDH5.2 yesterday and althought there was a hic-up getting the hive metastore schema upgraded, that was resolved eventually. Now in Hue, I can view/create databases & tables in the Metastore Manager, but in the query editors (Hive and/or Impala) there are no databases listed. Any suggestions on where to start troubleshooting?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
-
Cloudera Hue