Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Any recommendation on how to partition disk space for a Datanode & Namenode? Also do we need to enable Swap?

Solved Go to solution

Any recommendation on how to partition disk space for a Datanode & Namenode? Also do we need to enable Swap?

New Contributor
 
1 ACCEPTED SOLUTION

Accepted Solutions

Re: Any recommendation on how to partition disk space for a Datanode & Namenode? Also do we need to enable Swap?

All nodes:

Enough Space for logs /var/log ( 100GB? ), Also enough space for /var and /usr

At least the logs should have their own logical partition since its annoying when they run over.

Namenode:

Discs should be raided. Good best practice to keep a separate partition for the hadoop files ( /hadoop? ) A couple hundred GB should be sufficient. The disc requirements are not huge.

DataNodes:

2 OS discs can be raided for additional resiliency. Or just one drive reserved for OS for more data drives.

All other drives are data drives and should be non raided and simply added as simple volumes

/grid/x

ext4 is good for the data drives discs should be mounted with noatime.

Some more details here:

https://community.hortonworks.com/articles/14508/best-practices-linux-file-systems-for-hdfs.html

Swap: in general disable swap on datanodes since swapping should REALLY not happen on the datanodes and would most likely kill cluster performance. Better to have tasks fail and someone to look at it

On Master nodes its a bit more complex. Here depending on cluster size many tasks can be running and OOM errors can lead to unpredicable results. So swapping may be safe here. However make sure that yoy normally have enough space available.

But you may find other recommendations as well.

5 REPLIES 5

Re: Any recommendation on how to partition disk space for a Datanode & Namenode? Also do we need to enable Swap?

All nodes:

Enough Space for logs /var/log ( 100GB? ), Also enough space for /var and /usr

At least the logs should have their own logical partition since its annoying when they run over.

Namenode:

Discs should be raided. Good best practice to keep a separate partition for the hadoop files ( /hadoop? ) A couple hundred GB should be sufficient. The disc requirements are not huge.

DataNodes:

2 OS discs can be raided for additional resiliency. Or just one drive reserved for OS for more data drives.

All other drives are data drives and should be non raided and simply added as simple volumes

/grid/x

ext4 is good for the data drives discs should be mounted with noatime.

Some more details here:

https://community.hortonworks.com/articles/14508/best-practices-linux-file-systems-for-hdfs.html

Swap: in general disable swap on datanodes since swapping should REALLY not happen on the datanodes and would most likely kill cluster performance. Better to have tasks fail and someone to look at it

On Master nodes its a bit more complex. Here depending on cluster size many tasks can be running and OOM errors can lead to unpredicable results. So swapping may be safe here. However make sure that yoy normally have enough space available.

But you may find other recommendations as well.

Highlighted

Re: Any recommendation on how to partition disk space for a Datanode & Namenode? Also do we need to enable Swap?

New Contributor

Instead of swap, can we use tmpfs ?

Re: Any recommendation on how to partition disk space for a Datanode & Namenode? Also do we need to enable Swap?

What would you want to speed up with tmpfs? Most components in the hadoop environment only use the discs for persistence ( or require A LOT of space on the datanodes ) So a tmpfs store defeats the purpose for something like an fsimage etc.

The two components who aggressively use memory backed discs are Spark and Kafka. But both depend on OS filesystem buffers instead ( and tell the OS only to write through to disc if needed )

Re: Any recommendation on how to partition disk space for a Datanode & Namenode? Also do we need to enable Swap?

New Contributor

tmpfs is from RAM anyway, so if you already needed to swap out to swap partition, you won't have any space in RAM to spill to the tmpfs anyway.

tmpfs is used when you have huge RAM available and also you need to cache something fast and ephemeral, then tmpfs allows you to mount some amount of RAM into filesystem. so you can use it as if it's a FS mount.

Re: Any recommendation on how to partition disk space for a Datanode & Namenode? Also do we need to enable Swap?

@Vishnu Nair

To add to what @Benjamin Leonhardi mentioned, the good doc to start with is cluster planning guide(Refer Page No 7).

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_cluster-planning-guide/bk_cluster-planni...

A 12 page doc with loads of information.

Regarding Swap, Here is the recomendation.

https://community.hortonworks.com/questions/22548/what-is-the-hortonworks-recommendation-on-swap-usa...

Hope this helps

Don't have an account?
Coming from Hortonworks? Activate your account here