Posts: 8
Registered: ‎08-14-2017

How to perform post install setup for Spark on multiple machines

[ Edited ]



I have installed cloudera manager , spark option. This has spark and a few other products.

I have 3 VMs, each with 8 cores.

My intention is to run spark with multiple workers, 1 job running on each core. So 8 workers on each VM, for a total of 24 workers.

When I submit a job using spark-submit, I want each core to host a worker.


My spark code uses hash-partitioner to place 1 worker on each partition, then I run  a per partition map.

See this post for my sample code


My questions:

1) what command line should I use for spark submit?

2) what post install configurations do I need to do?  Does cloudera manager install not automatically do the spark configuration, realizing that I have 3 vms so I must be using all the 3x8 = 24 cores , 1 for each worker?