I would like to use Cloudera to setup a compute cluster. I apologize if I'm asking in the wrong place, and that this is a novice question. I tried asking in the Cloudera Director forum, but got no responses. I thank you in advance for your help.
What I'm trying to do, is setup a cluster on EC2 where the master is also the development machine and always on, but the slaves are added only as needed. The reason for having the master and working machine be the same, is that it makes some spark development issues a lot simpler.
I understand that with Cloudera 5.4, the way to do this is with Director, but it doesn't seem to have an option for using the machine its running on as the master. It always wants to launch a master.
Is there a solution? If I create a cluster with director, then add the working machine to the cluster as a master and decomission the original master, will that not work? What if I create a cluster with director, can I make the master of that cluster my working machine and set Director up on it to be aware of itself? Will this cause problems?
Alternatively, is there some better tool or script set that can help? I was using the spark-ec2 scripts but outgrew them. It seems that the earlier versions of cloudera manager might have had features so it would have been easier, but they were removed.
I will probably have to make my own ami, since I need some stuff that isn't in the cloudera amis. What packages do I need to make sure are installed? What do I need to avoid?
While I have your attention... how can I install hadoop-hdfs-fuse when I have CDH installed by parcel? apt-get hadoop-hdfs-fuse wants to pull in dependencies like hadoop-hdfs that conflict with the parcels.
And finally: Is there a simple method or script for distributing a file to every node in a cloudera cluster? With spark-ec2 I can say "copy_dir.sh" and a directory will appear in the same place everywhere. With "slaves.sh," I can execute the same arbitrary command on every node. Is there a Cloudera equivalent?