About Tully

Tully · ‎10-21-2015

Turns out that dividing by 1000 solves the issue: CAST(tp.time_stamp/1000 AS timestamp) which outputs the correct date and time. I still find it odd that datetime from mysql gets stored as TIMESTAMP_MILLIS --> int64 in parquet and bigint for the field in impala which can't be converted back to datetime without an extra step, but, things work now...

Tully · ‎10-20-2015

CDH 5.3.3 Impala 2.1.3-cdh5 Working on converting a large MySQL table into an Impala table formatted as parquet. The datetime field was causing issues so we converted it to unix time using Java's .getTime() which gave us '1397166481000' for '2014-04-10 21:48:01'. Table created and everything appears good to go, but when we try to query dates things get odd... Field: time_stamp BIGINT Now when we query this field using Impala we get different results: from_unixtime(time_stamp,'yyyy-MM-dd HH:mm:ss.sss') --> 2011-04-06 17:10:00.000 CAST(time_stamp AS timestamp) --> 2011-04-06 17:10:00 CAST(CAST(strleft(CAST(time_stamp AS STRING),10) AS bigint) AS timestamp) --> 2014-04-10 21:48:01 The casting BIGINT as a STRING, then lopping off the last 3 zeros, casting back to BIGINT, then casting as timestamp seems a bit excessive but I haven't been able to find the solution. Do I need to go back and fix things at the .getTime() level?

Tully · ‎11-14-2014

Got a Hive query that took 3-4min on 5.1, 20min after upgrading to 5.2, down to 1min on 5.2 with some tweaks. On 4.6 we were running MR1. When we upgraded to 5.1 we kept MR1 but had YARN/MR2 deployed also. We tried running jobs with YARN/MR2 but they took forever so we stayed with MR1 which was running them quickly. When we upgraded to 5.2 the MR1 jobs slowed to a crawl. Uninstalling YARN/MR2 and running with just MR1 made no difference (apparently it isn't recommended to run with both...). Re-installed YARN/MR2, uninstalled MR1, and got a HUGE improvement in runtimes (back to where we were at with 5.1). Throughout this process we worked with a performance engineer at Cloudera (Boris FTW) who helped us make further adjustments and tweak out the system resulting in even better performance. Still have a couple adjustments to make which he recommended (cleanup/best practices), but we're back in good shape again and can't say enough about Cloudera's enterprise support team.

Tully · ‎11-04-2014

Upgraded to 5.2 last week and have been noticing a SIGNIFICANT performance reduction when running the same queries pre and post-upgrade. For example, each morning a series of queries run (~200) and the day before the upgrade this entire process would take a little over 2h (and has taken that long for the preceeding 6 months). Currently running the same query takes +12h to complete and the only change being the upgrade (no sudden increase in data, node setup/usage, query changes, etc.). I've been looking into the performance improvment section (http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_admin_performance.html) but so far nothing has made a difference. I'm not seeing anything in memory, cpu, disk latency, garbage collection, failed tasks, or anything that really points to "this is why you're waiting so long" - also not sure if it is a hive, or mapreduce issue, or both, or something else entirely... If the time to produce an expected query result goes from 30s to 5min overnight after an upgrade - where would you start to look? P.s. both queries run in Hue/Beeswax and submitted directly to the cluster are experiencing the same issue.

Tully · ‎11-04-2014

Was a bad metastore server - ended up stopping the metastore server and using a different one.

Tully · ‎10-30-2014

Upgraded to CDH5.2 yesterday and althought there was a hic-up getting the hive metastore schema upgraded, that was resolved eventually. Now in Hue, I can view/create databases & tables in the Metastore Manager, but in the query editors (Hive and/or Impala) there are no databases listed. Any suggestions on where to start troubleshooting?

Online	Offline
Last Visited	‎01-31-2017 09:04 AM

Member Since	‎02-27-2014 10:37 AM
Last Visited	‎01-31-2017 09:04 AM
Posts	8
Kudos received	5

Cloudera Community

Re: Unix time with decimal to timestamp

Re: Reduced query performance since 5.2 upgrade

Re: DB/tables in metastore manager, but not in Hiv...

Re: Unix time with decimal to timestamp

Unix time with decimal to timestamp

Re: Reduced query performance since 5.2 upgrade

Reduced query performance since 5.2 upgrade

Re: DB/tables in metastore manager, but not in Hiv...

DB/tables in metastore manager, but not in Hive qu...