We've built a tiny HDP 2.6 cluster on CentOS 7, having 3 nodes only. Each node serves a mixed role (is master and worker at the same time). The nodes are physical, each equipped with 12 HDD and 2 SSD disks. We use a very limited number of services (YARN, HDFS, RANGER, KNOX, HIVESERVER2 and SPARK).
On one hand we wanted to take advantage of the SSDs attached to each host but we wanted it easily to set up with minimal effort.
We finally decided to build a RAID10 (hardware raid controller used) group on both SSDs on each host and mount them as /hadoop (ext4). The goal was to store there the files which may benefit from improved IO. Of course HDFS data is stored on HDDs with no RAID etc.
On /hadoop we currently store
the metadata RDBMS database (clustered mysql for ambari, hive metastore, ranger etc.)
journal node edits
namenode dir for edits and fsimage
yarn timeline DB (leveldb)
job history DB (leveldb)
Ambari Metrics DB (hbase - rootdir only)
Zookeeper data directory
Ambari Infra data (SolR data)
The cluster performs quite well. With such a setup we have protection (mirroring) for each type of the mentioned data and good IO performance.
Do you see any problem with such a setup?
Do you see some other good candidates to be placed on /hadoop backed by SSDs?
As /hadoop has raid level mirroring we use only one location for dfs.namenode.name.dir, so no data mirroring at the software level. Is that ok?