08-28-2014 04:54 AM
I have cloudera enterprise data hub edition 5.1.0 installed in a single system. Due to some requirement I need to create one extra worker in spark. Currently, it has 1 master and 1 worker running but I want 1 master and 2 worker. I have tried to create following the guideline of CDH(added SPARK_WORKER_INSTANCES=2) in spark-env.sh. It didn't worked for me.
I followed the same steps in spark out of CDH(just downlaoded from apache website) I am able create extra worker.
Could someone let me know what would be steps for creating extra worker in spark inside CDH 5.1.0?
Thanks in advance.
08-28-2014 05:09 AM
I assume you're working in standalone mode. You can just go to the Spark service in Cloudera Manager, click Instances, click Add Role Instances, and assign other hosts as workers.
You do not need to install Spark. It is already installed. In fact I would not change its configuration files directly unless you're sure you know what you're doing.
08-28-2014 05:48 AM
08-29-2014 12:54 AM
I am not able to create one extra worker in spark in CDH. I need 2 workers with 1 master in my cdh spark.
CDH spark has 1 master and 1 worker as default , this way I am not able to do group by opearion on streams. because of that I am looking for minimum 2 workers.
Thanks in advance
08-29-2014 04:00 AM
It doesn't make sense to put two workers on one host. One worker can host many executors, and an executor can even run many tasks in parallel. Your default parallelism will be a function of the number of cores, which should much more than 1. As long as your input has more than one partition you'll get parallel execution. If not, use repartition() to make more partitions.
09-01-2014 12:45 AM
09-01-2014 04:08 AM