Support Questions
Find answers, ask questions, and share your expertise

cluster tuning

cluster tuning

Master Collaborator

i have a 5 node cluster with one name node and three datanodes. The server each has 8 gb of RAM.

with this configuration i can't even start three hive sessions .. is it normal or is my cluster needing tuning?

i need to put 10 users on this cluster to use hive , please advise if its possible.


Re: cluster tuning


@Sami Ahmad

Planning the Hadoop cluster remains a complex task that requires a minimum knowledge of the Hadoop architecture. Your current setup of 8 GB per node cannot handle the workload you are subjecting it to. The cluster setup should match your use case if you are going to run memory intensive operations or sqoop jobs export 100'GB of data then you need to rethink the architecture and sizing.

8 GB is recommended for the HDP sandbox and from experience that is still slow for one user :-) Remember under the hood hive uses MR, LLAP and Tez execution engines which spawn mappers and reducers which need memory and LLAP and tez are even more memory intensive.

With all, you Master processes running on a node with 8 GB is already an overload.

Hadoop’s performance depends on multiple factors based on well-configured software layers and well-dimensioned hardware resources that utilize its CPU, Memory, hard drive (storage I/O) and network bandwidth efficiently.

Small clusters with fewer than 10 worker nodes do not require much for master nodes in terms of hardware. A solid baseline hardware profile for a cluster of this size is a

  • Dual quad-core 2.6 Ghz CPU,
  • 24 GB of DDR3 RAM,
  • Dual 1 Gb Ethernet NICs, a SAS drive controller, and at least two SATA II drives in a JBOD configuration in addition to the host OS device.

Configuring your cluster correctly

To run Hadoop and get a maximum performance, it needs to be configured correctly. But the question is how to do that. Well, based on experiences, there is not one single answer to this question. The experiences gave us a clear indication that the Hadoop framework should be adapted for the cluster it is running on and sometimes also to the job.

In order to configure your cluster correctly, I recommend running a Hadoop job(s) the first time with its default configuration to get a baseline. Then, you will check the resource’s weakness (if it exists) by analyzing the job history log files and report the results (measured time it took to run the jobs). After that, iteratively, you will tune your Hadoop configuration and re-run the job until you get the configuration that fits your business needs.

Here is an article for Benchmarking Hadoop