I have 5 computers, each with four 2TB disks, each mapped to a differnt drive.
I want to use cloudera manager to install Spark on my 5 machine cluster.
I want each computer to have 4*2= 8 TB hdfs storage.
How can I do this ? I have used cloudera manager before, but only on machines with only 1 data drive.
What should I do now that I have 4 data drives on every machine?
Assuming you will have HDFS service deployed on all 5 nodes, you can configure data directories through HDFS service on install, or post install through HDFS configuration. Go to Cloudera Manager -> HDFS -> Instances -> DataNode -> Configuration -> DataNode Data Directory, and add your mount points there for post install.
thank you RobertM for replying.
So you are saying that I can install Hadoop and Spark from Cloudera manager, just using one of the data drives as my HDFS storage.
After the install, I can simply add more local drives as HDFS drives from the Cloudera manager UI.
Is my understanding correct ?
Yes, thats correct. If you are doing a fresh install then you can select which services to activate after the packages have been distributed. If you have already joined the cluster and under management with CM then you can add your additional drives through hdfs configuration.