Created 04-05-2017 01:08 AM
Additional Questions:
Master 1 (NameNode)
Raid Configuration:
Worker 2 (DataNode1)
Raid Configuration:
Worker 3 (DataNode2)
Raid Configuration:
Created 04-07-2017 01:00 AM
For NN data some fault-tolerant RAID, like 1, 5, or 10 is fine. On worker nodes, for hdfs data you should use JBOD or RAID-0 per disk (so that you have 3 mount points). RAID-1 for OS on all nodes is fine. I'm not sure what do you mean by "cashing".
Created 04-07-2017 06:00 AM
Great! will take note of that, how about in Spark, should I set a buffer on SSDs for DISK persist?
Created 04-07-2017 07:16 AM
Not sure about Spark, but IMO you can do that when you configure HDFS, put SSD nodes in another Ambari config group and set the space not to be used by HDFS.