Support Questions

vishnu_nair66 · ‎04-15-2016

bleonhardi · ‎04-15-2016

All nodes:

Enough Space for logs /var/log ( 100GB? ), Also enough space for /var and /usr

At least the logs should have their own logical partition since its annoying when they run over.

Namenode:

Discs should be raided. Good best practice to keep a separate partition for the hadoop files ( /hadoop? ) A couple hundred GB should be sufficient. The disc requirements are not huge.

DataNodes:

2 OS discs can be raided for additional resiliency. Or just one drive reserved for OS for more data drives.

All other drives are data drives and should be non raided and simply added as simple volumes

/grid/x

ext4 is good for the data drives discs should be mounted with noatime.

Some more details here:

https://community.hortonworks.com/articles/14508/best-practices-linux-file-systems-for-hdfs.html

Swap: in general disable swap on datanodes since swapping should REALLY not happen on the datanodes and would most likely kill cluster performance. Better to have tasks fail and someone to look at it

On Master nodes its a bit more complex. Here depending on cluster size many tasks can be running and OOM errors can lead to unpredicable results. So swapping may be safe here. However make sure that yoy normally have enough space available.

But you may find other recommendations as well.

View solution in original post

bleonhardi · ‎04-15-2016

All nodes:

Enough Space for logs /var/log ( 100GB? ), Also enough space for /var and /usr

At least the logs should have their own logical partition since its annoying when they run over.

Namenode:

Discs should be raided. Good best practice to keep a separate partition for the hadoop files ( /hadoop? ) A couple hundred GB should be sufficient. The disc requirements are not huge.

DataNodes:

2 OS discs can be raided for additional resiliency. Or just one drive reserved for OS for more data drives.

All other drives are data drives and should be non raided and simply added as simple volumes

/grid/x

ext4 is good for the data drives discs should be mounted with noatime.

Some more details here:

https://community.hortonworks.com/articles/14508/best-practices-linux-file-systems-for-hdfs.html

Swap: in general disable swap on datanodes since swapping should REALLY not happen on the datanodes and would most likely kill cluster performance. Better to have tasks fail and someone to look at it

On Master nodes its a bit more complex. Here depending on cluster size many tasks can be running and OOM errors can lead to unpredicable results. So swapping may be safe here. However make sure that yoy normally have enough space available.

But you may find other recommendations as well.

vishnu_nair66 · ‎04-15-2016

Instead of swap, can we use tmpfs ?

bleonhardi · ‎04-16-2016

What would you want to speed up with tmpfs? Most components in the hadoop environment only use the discs for persistence ( or require A LOT of space on the datanodes ) So a tmpfs store defeats the purpose for something like an fsimage etc.

The two components who aggressively use memory backed discs are Spark and Kafka. But both depend on OS filesystem buffers instead ( and tell the OS only to write through to disc if needed )

linehrr · ‎10-29-2017

tmpfs is from RAM anyway, so if you already needed to swap out to swap partition, you won't have any space in RAM to spill to the tmpfs anyway.

tmpfs is used when you have huge RAM available and also you need to cache something fast and ephemeral, then tmpfs allows you to mount some amount of RAM into filesystem. so you can use it as if it's a FS mount.

Jagatheeshr · ‎04-16-2016

@Vishnu Nair

To add to what @Benjamin Leonhardi mentioned, the good doc to start with is cluster planning guide(Refer Page No 7).

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_cluster-planning-guide/bk_cluster-planni...

A 12 page doc with loads of information.

Regarding Swap, Here is the recomendation.

https://community.hortonworks.com/questions/22548/what-is-the-hortonworks-recommendation-on-swap-usa...

Hope this helps

Cloudera Community

Support Questions

Any recommendation on how to partition disk space for a Datanode & Namenode? Also do we need to enable Swap?