Reply
New Contributor
Posts: 1
Registered: ‎10-16-2017

Re: Adding nodes will improve performance ?

[ Edited ]

We seem to have the same exact problem. We added 20 nodes to our existing cluster of 60 nodes which makes it 80 nodes. The new nodes are of the same configuration/capacity of the old ones. We do have heavy and concurrent jobs (Hive queries) that could easily flood the server 100%, this is to confirm that the cluster is not under-utilized. We did rebalance the data and verified that they are evenly balanced across the data nodes. We dont see any improvement at all after the upgrade, the job timings are same as before the upgrade.

Do we need to update stats, metastore or any ther configuration for the new nodes to take effect in terms of performance ?? Any insights on this is much appreciated.

Champion
Posts: 563
Registered: ‎05-16-2016

Re: Adding nodes will improve performance ?

there are couple of places that needsd tuining in the query level 

 

1 . stats for the table is must for good performance 

2.  when user is joining two tables make sure there are using the large table in the last and the first table is smaller 

3. you can also use HINTS to imporve query performance.

4. hive table's file format is big a factor 

5. choosing when to use paritioning vs bucketing. 

 

6.allocate good memory to hiveserver2 and metastore 

7.heapsize 

8 .load balancer on the host 

 

https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_cm_ha_hosts.html#concept_qkr_bf...

Announcements