Reply
Explorer
Posts: 9
Registered: ‎08-28-2014
Accepted Solution

Extra worker in spark

Hi All,

 

I have cloudera enterprise data hub edition 5.1.0 installed in a single system. Due to some requirement I need to create one extra worker in spark. Currently, it has 1 master and 1 worker running but I want 1 master and 2 worker. I have tried to create following the guideline of CDH(added SPARK_WORKER_INSTANCES=2) in spark-env.sh. It didn't worked for me. 

 

I followed the same steps in spark out of CDH(just downlaoded from apache website) I am able create extra worker.

 

Could someone let me know what would be steps for creating extra worker in spark inside CDH 5.1.0?

 

 

Thanks in advance.

Nishikant

 

Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: Extra worker in spark

I assume you're working in standalone mode. You can just go to the Spark service in Cloudera Manager, click Instances, click Add Role Instances, and assign other hosts as workers. 

 

You do not need to install Spark. It is already installed. In fact I would not change its configuration files directly unless you're sure you know what you're doing. 

Explorer
Posts: 9
Registered: ‎08-28-2014

Re: Extra worker in spark

I had installed spark outside the CDH to verify the steps those are required to create extra worker in standalone mode.

As per your reply go to Spark service in Cloudera Manager, click Instances, click Add Role Instances. here I want to give same host as another worker but it not taking.
Could you provide some screen shot or something and reply to my mail ID- nkantkumar@gmail.com
Explorer
Posts: 9
Registered: ‎08-28-2014

Re: Extra worker in spark

Hi All, 

 

I am not able to create one extra worker in spark in CDH. I need 2 workers with 1 master in my cdh spark.

CDH spark has 1 master and 1 worker as default , this way I am not able to do group by opearion on streams. because of that I am looking for minimum 2 workers.

 

Thanks in advance

Nishi

 

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Extra worker in spark

It doesn't make sense to put two workers on one host. One worker can host many executors, and an executor can even run many tasks in parallel. Your default parallelism will be a function of the number of cores, which should much more than 1. As long as your input has more than one partition you'll get parallel execution. If not, use repartition() to make more partitions.

Explorer
Posts: 9
Registered: ‎08-28-2014

Re: Extra worker in spark

Thanks a lot

I am going through your suggestion to create more number of partition in RDD to achieve groupByKey on stream data.

Meanwhile, Could you please let me know how to add another worker in different host. I have 2 m/c where cloudera enterprise data hub edition 5.1.0 is installed. I want one master and 2 workers. one worker will be on another machine.
Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Extra worker in spark

See my message above about modifying roles. You would just set an additional host to be a worker. I'm assuming you are using standalone mode.

Explorer
Posts: 9
Registered: ‎08-28-2014

Re: Extra worker in spark

Problem is that It doesn't reflect second host information inside, add role instance> select host. It is only showing my current host and all it's information.

Could you please let me know, why it is happening.
Explorer
Posts: 9
Registered: ‎08-28-2014

Re: Extra worker in spark

Thanks you very much

 

It solved my problem.

Announcements