08-26-2017 04:05 AM - edited 08-26-2017 04:05 AM
I have installed cloudera manager , spark option. This has spark and a few other products.
I have 3 VMs, each with 8 cores.
My intention is to run spark with multiple workers, 1 job running on each core. So 8 workers on each VM, for a total of 24 workers.
When I submit a job using spark-submit, I want each core to host a worker.
My spark code uses hash-partitioner to place 1 worker on each partition, then I run a per partition map.
See this post for my sample code http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/how-to-check-that-all-partitions-ar...
1) what command line should I use for spark submit?
2) what post install configurations do I need to do? Does cloudera manager install not automatically do the spark configuration, realizing that I have 3 vms so I must be using all the 3x8 = 24 cores , 1 for each worker?