Reply
Explorer
Posts: 7
Registered: ‎08-14-2017

newbie question - spark-submit seems to use single node

[ Edited ]

Hi

I installed couldera manager , apache spark option, took all defaults.

 

I have 24 cores, 8 cores per VM, so 3 vm cluster

 

I am not sure of my cluster is running on 3 VMs ? See image below.

Maybe I have to start the 3 nodes explicitly?

 

I ran a job using spark submit, and I collect and print the reurn values.

In each map job I am effectively returning a String , which is the hostname.

My return value shows taht everything ran on vm-2. I was expecting 8 taks to run on vm-1, 8 on vm-2 and 8 on vm-3.

 

EDIT : I am using hashpartitoner(24), in an attempt to put 1 element on each core.

So I expect 8 jobs to run on each VM , 1 per core.

 

Any comments will be appreciated. If more detail is needed I can attach it here.

 

spark cluster single node or 3 nodes.png

Explorer
Posts: 7
Registered: ‎08-14-2017

Re: newbie question - spark-submit seems to use single node

Also, if an admin is reading this, how do I change my display nickname ? I want to change it to "srini" .
Posts: 579
Kudos: 64
Solutions: 33
Registered: ‎04-06-2015

Re: newbie question - spark-submit seems to use single node

I'll look into the admin question on the community nickname. :)

 




Cy Jervis, Community Manager - I'm not an expert but will supply relevant content from time to time. :)

Learn more about the Cloudera Community:


Terms of Service


Community Guidelines


How to use the forum

Explorer
Posts: 7
Registered: ‎08-14-2017

Re: newbie question - spark-submit seems to use single node

Thank you. I could not find a way to change it . So if an admin can change it for me, that is okay.
Highlighted
New Contributor
Posts: 5
Registered: ‎08-24-2017

Re: newbie question - spark-submit seems to use single node

The Spark install in CDH relies on an existing installation of Yarn; The existing parcel only creates Gateway and History Server instances.

 

The Gateway (like gateways for hadoop services) give the host the appropriate configuration to submit jobs as any client would.

 

The History Server just maintains logs for running jobs.

 

To actually run jobs, you must do one of two things:

 

  1. Launch a yarn cluster, and submit your spark jobs with --master yarn
  2. Launch a Spark-Standalone cluster manually by initializing the services in $SPARK_HOME/sbin

My wager is that you launched the job from the client on vm-2 in local mode, leading to the result that you saw.

Announcements