Reply
TS
Contributor
Posts: 84
Registered: ‎02-10-2015

NN/SNN/DN Roles and HDFS mount points (Best Practices)

I have read so far from various sources that HDFS's roles and their corresponfing HDFS mount points configred as follows:

 

<< NN (NameNode) >>

Runs on a single Master server with a couple HDs (3 or 4x300GB HDs). [These are JBOD disks]

This is where NameNode Data Directories (dfs.name.dir, dfs.namenode.name.dir) are defined:

/hadoop/D01/nn

 

<< SNN (Secondary NameNode) >>

Runs on a single SecodaryMaster server with a couple HDs (3 or 4x300GB HDs). [These are JBOD disks]

This is where HDFS Checkpoint Directories (fs.checkpoint.dir, dfs.namenode.checkpoint.dir) are defined:

/hadoop/D01/snn
 

<< DN (DataNode) >>

Runs on many DataNode servers wehere each host has many HDs (20 or so x300GB HDs). [These are JBOD disks]

This is where DataNode Data Directories (dfs.data.dir, dfs.datanode.data.dir) are defined:
/hadoop/D01/dn
/hadoop/D02/dn
........................
/hadoop/D20/dn
 
Is this the Best way to deploy a Hadoop/HDFS cluster or all nodes (NN, SNN, and DN)?
Sshould all hosts have the same number of disks? or the DN should have just a few and DN as many as we can?
 
Looking for Best Practices!
 
 
Cloudera Employee
Posts: 13
Registered: ‎10-16-2013

Re: NN/SNN/DN Roles and HDFS mount points (Best Practices)

In general NN and SNN do not require too much storage.  While with DN it's not uncommon to see many hard disks used for data drives (like 12-24).  The number depends on your storage requirements.

 

You might be interested to read this blog post about hardware guidelines.  Keep in mind the numbers are from 2013 but the same principles apply.

 

http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/

 

Announcements