10-16-2017 07:01 AM - edited 10-16-2017 07:04 AM
We seem to have the same exact problem. We added 20 nodes to our existing cluster of 60 nodes which makes it 80 nodes. The new nodes are of the same configuration/capacity of the old ones. We do have heavy and concurrent jobs (Hive queries) that could easily flood the server 100%, this is to confirm that the cluster is not under-utilized. We did rebalance the data and verified that they are evenly balanced across the data nodes. We dont see any improvement at all after the upgrade, the job timings are same as before the upgrade.
Do we need to update stats, metastore or any ther configuration for the new nodes to take effect in terms of performance ?? Any insights on this is much appreciated.
10-19-2017 07:15 PM
there are couple of places that needsd tuining in the query level
1 . stats for the table is must for good performance
2. when user is joining two tables make sure there are using the large table in the last and the first table is smaller
3. you can also use HINTS to imporve query performance.
4. hive table's file format is big a factor
5. choosing when to use paritioning vs bucketing.
6.allocate good memory to hiveserver2 and metastore
8 .load balancer on the host