Created on 02-21-2017 01:18 PM - edited 09-16-2022 04:07 AM
Hi.
If adding 3 nodes to my 3 nodes clusters would obviously increase performance by x2 at least ? or there is more parameters to consider to improve to x2 ?
Thanks
Created 02-21-2017 02:16 PM
Very Hypothetical "one line" question.
I don't think just adding few extra nodes will double the performance...Few of the additional parameters that you need to consider as
1. The way services are configured in the cluster is also very important. Ex: You have 3 nodes now, Consider 10 services are configured in 3 nodes. After 3 more nodes are added, you need to properly distribute the services to the new nodes as well
On Existing Cluster - without adding new nodes:
1. If possible, Add RAM to existing nodes
2. Identify which particular services required better performance like hive, impala, etc. You can tune the environment configuration for those services. Ex: Increase Java heap size, etc
3. Prioritize the jobs
etc
Created 02-22-2017 02:43 PM
Created 04-10-2017 01:57 PM
I've added 4 nodes to my 4 nodes cluster and i don't see any benefits. Queries againsts 8 nodes cluster perform the same as against 4 nodes cluster. All datanodes have same specifications.
Created 04-10-2017 02:10 PM
Created 04-10-2017 02:15 PM
Created 04-10-2017 02:17 PM
Yes we have also rebalanced the data.
Created on 04-10-2017 11:23 PM - edited 04-10-2017 11:24 PM
I would say just Adding nodes would not result good performance all the time , as you said there are some parameters thats needs to be take care . I would consider doing few things like Optimized joins , making the large table in the query as last when performing join or use hint like Streamtable . Enabling the Local mode , mainly tuning the number of mappers and reducers , JMV reuse and finally using the good old Index ,sometimes help speed up the group by query in hive . I also agree with @saranvisa and @mbigelow on their thoughts .
Created 04-12-2017 12:42 PM
Even a simple select without any joins does not have any benefits to double the number of workers. Performance remain the same either on 4 nodes or 8 nodes.
Created 04-14-2017 01:03 AM
That seems rather normal. Low complexity queries tend to use a small amount of yarn containers.
Adding containers where you don't have a shortage issue of containers will not speed-up things.
But you will be able to handle more concurrent queries without slowing down.