Reply
Highlighted
Explorer
Posts: 8
Registered: ‎08-14-2017

How to perform post install setup for Spark on multiple machines

[ Edited ]

Hi

 

I have installed cloudera manager , spark option. This has spark and a few other products.

I have 3 VMs, each with 8 cores.

My intention is to run spark with multiple workers, 1 job running on each core. So 8 workers on each VM, for a total of 24 workers.

When I submit a job using spark-submit, I want each core to host a worker.

 

My spark code uses hash-partitioner to place 1 worker on each partition, then I run  a per partition map.

See this post for my sample code http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/how-to-check-that-all-partitions-ar...

 

My questions:

1) what command line should I use for spark submit?

2) what post install configurations do I need to do?  Does cloudera manager install not automatically do the spark configuration, realizing that I have 3 vms so I must be using all the 3x8 = 24 cores , 1 for each worker?

 

Announcements