11-04-2014 09:54 AM
Upgraded to 5.2 last week and have been noticing a SIGNIFICANT performance reduction when running the same queries pre and post-upgrade. For example, each morning a series of queries run (~200) and the day before the upgrade this entire process would take a little over 2h (and has taken that long for the preceeding 6 months). Currently running the same query takes +12h to complete and the only change being the upgrade (no sudden increase in data, node setup/usage, query changes, etc.).
I've been looking into the performance improvment section (http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_admin_performance.h... but so far nothing has made a difference. I'm not seeing anything in memory, cpu, disk latency, garbage collection, failed tasks, or anything that really points to "this is why you're waiting so long" - also not sure if it is a hive, or mapreduce issue, or both, or something else entirely...
If the time to produce an expected query result goes from 30s to 5min overnight after an upgrade - where would you start to look?
P.s. both queries run in Hue/Beeswax and submitted directly to the cluster are experiencing the same issue.
11-11-2014 02:57 PM
Noticed the same thing - for both Hive and Impala. Don't see anything obviously wrong.
From what I can tell so far, it appears to have something to do with the Hive Metastore. It occasionally fails the Canary tests, and trying to drop tables with large partitions throws errors like:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.datanucleus.exceptions.NucleusDataStoreException: Update of object "org.apache.hadoop.hive.metastore.model.MStorageDescriptor@2e8e6344" using statement "UPDATE `SDS` SET `CD_ID`=? WHERE `SD_ID`=?" failed : java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
From Impala we see a lot of tables not having their Metadata loaded:
Missing tables were not received in 120000ms. Load request will be retried.
Requesting prioritized load of table(s):
.... repeats ... repeats ... and then usually eventually loads.
The net result is a query that used to take X amount of time now takes significantly longer.
11-13-2014 08:39 AM
Not sure if it helps your situation, but I noticed a few bug reports on HIVE direct SQL causing issues with table statistics/etc. Per the workaround described in those tickets, I disabled:
And in parallel also disabled the Sentry service and went back to using the static file.
Although the bug reports aren't directly releated to slowness.... it appears most of our Hive issues are looking a lot better. Queries that were having issues are now working, those that were slow are fast again, etc. Still having some weird issues with Impala.
Overall pretty dissapointed with the 5.2 upgrade. Looks like a lot of bugs, and what was pitched as a big performance increase appears to be anything but. Hopefully 5.2.1 is an improvement...
11-14-2014 07:55 AM
Got a Hive query that took 3-4min on 5.1, 20min after upgrading to 5.2, down to 1min on 5.2 with some tweaks.
On 4.6 we were running MR1. When we upgraded to 5.1 we kept MR1 but had YARN/MR2 deployed also. We tried running jobs with YARN/MR2 but they took forever so we stayed with MR1 which was running them quickly. When we upgraded to 5.2 the MR1 jobs slowed to a crawl. Uninstalling YARN/MR2 and running with just MR1 made no difference (apparently it isn't recommended to run with both...). Re-installed YARN/MR2, uninstalled MR1, and got a HUGE improvement in runtimes (back to where we were at with 5.1).
Throughout this process we worked with a performance engineer at Cloudera (Boris FTW) who helped us make further adjustments and tweak out the system resulting in even better performance.
Still have a couple adjustments to make which he recommended (cleanup/best practices), but we're back in good shape again and can't say enough about Cloudera's enterprise support team.