Support Questions

nishi · ‎08-28-2014

Hi All,

I have cloudera enterprise data hub edition 5.1.0 installed in a single system. Due to some requirement I need to create one extra worker in spark. Currently, it has 1 master and 1 worker running but I want 1 master and 2 worker. I have tried to create following the guideline of CDH(added SPARK_WORKER_INSTANCES=2) in spark-env.sh. It didn't worked for me.

I followed the same steps in spark out of CDH(just downlaoded from apache website) I am able create extra worker.

Could someone let me know what would be steps for creating extra worker in spark inside CDH 5.1.0?

Thanks in advance.

Nishikant

srowen · ‎08-29-2014

It doesn't make sense to put two workers on one host. One worker can host many executors, and an executor can even run many tasks in parallel. Your default parallelism will be a function of the number of cores, which should much more than 1. As long as your input has more than one partition you'll get parallel execution. If not, use repartition() to make more partitions.

View solution in original post

srowen · ‎08-28-2014

I assume you're working in standalone mode. You can just go to the Spark service in Cloudera Manager, click Instances, click Add Role Instances, and assign other hosts as workers.

You do not need to install Spark. It is already installed. In fact I would not change its configuration files directly unless you're sure you know what you're doing.

nishi · ‎08-28-2014

I had installed spark outside the CDH to verify the steps those are required to create extra worker in standalone mode.

As per your reply go to Spark service in Cloudera Manager, click Instances, click Add Role Instances. here I want to give same host as another worker but it not taking.
Could you provide some screen shot or something and reply to my mail ID- nkantkumar@gmail.com

nishi · ‎08-29-2014

Hi All,

I am not able to create one extra worker in spark in CDH. I need 2 workers with 1 master in my cdh spark.

CDH spark has 1 master and 1 worker as default , this way I am not able to do group by opearion on streams. because of that I am looking for minimum 2 workers.

Thanks in advance

Nishi

srowen · ‎08-29-2014

It doesn't make sense to put two workers on one host. One worker can host many executors, and an executor can even run many tasks in parallel. Your default parallelism will be a function of the number of cores, which should much more than 1. As long as your input has more than one partition you'll get parallel execution. If not, use repartition() to make more partitions.

nishi · ‎09-01-2014

Thanks a lot

I am going through your suggestion to create more number of partition in RDD to achieve groupByKey on stream data.

Meanwhile, Could you please let me know how to add another worker in different host. I have 2 m/c where cloudera enterprise data hub edition 5.1.0 is installed. I want one master and 2 workers. one worker will be on another machine.

srowen · ‎09-01-2014

See my message above about modifying roles. You would just set an additional host to be a worker. I'm assuming you are using standalone mode.

nishi · ‎09-01-2014

Problem is that It doesn't reflect second host information inside, add role instance> select host. It is only showing my current host and all it's information.

Could you please let me know, why it is happening.

nishi · ‎09-03-2014

Thanks you very much

It solved my problem.

Cloudera Community

Support Questions

Extra worker in spark