Created on 11-04-2014 09:54 AM - edited 09-16-2022 02:11 AM
Upgraded to 5.2 last week and have been noticing a SIGNIFICANT performance reduction when running the same queries pre and post-upgrade. For example, each morning a series of queries run (~200) and the day before the upgrade this entire process would take a little over 2h (and has taken that long for the preceeding 6 months). Currently running the same query takes +12h to complete and the only change being the upgrade (no sudden increase in data, node setup/usage, query changes, etc.).
I've been looking into the performance improvment section (http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_admin_performance.h... but so far nothing has made a difference. I'm not seeing anything in memory, cpu, disk latency, garbage collection, failed tasks, or anything that really points to "this is why you're waiting so long" - also not sure if it is a hive, or mapreduce issue, or both, or something else entirely...
If the time to produce an expected query result goes from 30s to 5min overnight after an upgrade - where would you start to look?
P.s. both queries run in Hue/Beeswax and submitted directly to the cluster are experiencing the same issue.