Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Adding nodes will improve performance ?

avatar

Hi.

 

If adding 3 nodes to my 3 nodes clusters would obviously increase performance by x2 at least ? or there is more parameters to consider to improve to x2 ?

 

Thanks

11 REPLIES 11

avatar
New Contributor

We seem to have the same exact problem. We added 20 nodes to our existing cluster of 60 nodes which makes it 80 nodes. The new nodes are of the same configuration/capacity of the old ones. We do have heavy and concurrent jobs (Hive queries) that could easily flood the server 100%, this is to confirm that the cluster is not under-utilized. We did rebalance the data and verified that they are evenly balanced across the data nodes. We dont see any improvement at all after the upgrade, the job timings are same as before the upgrade.

Do we need to update stats, metastore or any ther configuration for the new nodes to take effect in terms of performance ?? Any insights on this is much appreciated.

avatar
Champion

there are couple of places that needsd tuining in the query level 

 

1 . stats for the table is must for good performance 

2.  when user is joining two tables make sure there are using the large table in the last and the first table is smaller 

3. you can also use HINTS to imporve query performance.

4. hive table's file format is big a factor 

5. choosing when to use paritioning vs bucketing. 

 

6.allocate good memory to hiveserver2 and metastore 

7.heapsize 

8 .load balancer on the host 

 

https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_cm_ha_hosts.html#concept_qkr_bf...