Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Best cluster configuration

avatar
Rising Star

Hi all,

I'm new to Hadoop and i'm currently working on a project using HDP. I've an OVH server with the following config:

- 4 CPUs x Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz - RAM : 32 GB - Storage : 2 TB with ESXi installed.

My question is about the best partitioning schema and the number of Nodes. I'm not sure if 4 nodes with 1 cpu, 8 GB of RAM and 500 GB HDD is good, at least for development (1 Namenode and 3 Datanodes)? I'm working with Data from a middle sized retail.

Thanks.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Zaher Mahdhi

** Assumption **

Lab environment and no performance test will be done

You have one server and planning to crave out 4 machines. I am sure you are not expecting high performance from this setup.

You can have 1 master and 3 workers/data nodes (In your case, you can pick one of datanodes and install other master services so basically, you can mix 1 datanode with master components --> Only applicable to sandbox/lab environments)

Don't install everything...HDFS, Mapreduce, Yarn , Zookeeper , Hive --> start with these components and then later you can add as you go.

Make sure that you give enough space to logs

for each vm

/ - root , generally 20GB is good

/usr/hdp - 20GB

/var/log - 50 to 100GB (small setup)

/hadoop - rest of space

View solution in original post

8 REPLIES 8

avatar
Master Mentor

@Zaher Mahdhi

** Assumption **

Lab environment and no performance test will be done

You have one server and planning to crave out 4 machines. I am sure you are not expecting high performance from this setup.

You can have 1 master and 3 workers/data nodes (In your case, you can pick one of datanodes and install other master services so basically, you can mix 1 datanode with master components --> Only applicable to sandbox/lab environments)

Don't install everything...HDFS, Mapreduce, Yarn , Zookeeper , Hive --> start with these components and then later you can add as you go.

Make sure that you give enough space to logs

for each vm

/ - root , generally 20GB is good

/usr/hdp - 20GB

/var/log - 50 to 100GB (small setup)

/hadoop - rest of space

avatar
Rising Star

@Neeraj Sabharwal

Thank you for your answer. This is a development and test environment 🙂

avatar
Master Mentor

@Zaher Mahdhi Perfect! You can go through the doc but many times, you have to find workaround based on the hardware limitations.

This is your starting point for the next step http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_Installing_HDP_AMB/content/_meet_minimum_...

Good luck!!!! 🙂

avatar
Rising Star
@Neeraj Sabharwal

Do you recommend using Ambari or manual installation ? in the case of using Ambari, i went through documentation and didn’t find where I can allocate space for logs. Thanks.

avatar
Master Mentor

@Zaher Mahdhi Ambari installation method is the BEST way to go.

Make sure that you pay attention to following setting while installing the cluster if you want to customize the log location. You cannot change this once installation is done.

This will be for each component.

2122-screen-shot-2016-02-14-at-42142-pm.png

avatar
Master Mentor

avatar
Master Mentor
@Zaher Mahdhi

your question has many answers, I suggest you read our cluster planning guide http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_cluster-planning-guide/content/ch_hardwar...

avatar
Guru