02-10-2016 05:49 AM
I have configured Apache Spark standalone cluster into two Ubuntu 14.04 VMs. One of the VMs i.e. Master and the other one i.e. Worker,both are connected with password less ssh described here.
After that from the Master, I have started master as well as worker by the following command from the spark home directory -
Then I run the following command from Master as well as Woker VMs.
It seemed that the Master and Worker is running properly and also in Web UI, there is no error occured. But when I am trying to run an application using the following command-
It gives WARN message in console that-
TaskSchedulerImpl:Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
When I am trying to run the application by creating the worker in the same server where the master exists, the application have executed successfully. The WARN message is giving only when the Master and Worker are separated. Both VMs are memory optimized ec2 instances(r3.xlarge)
My test application is following -
from pyspark import SparkConf, SparkContext from pyspark.sql import SQLContext conf =SparkConf().setMaster('spark://SparkMaster:7077').setAppName("My_App") sc = SparkContext(conf=conf) SQLCtx = SQLContext(sc) list_of_list = sc.textFile("ver1_sample.csv").map(lambda line: line.split(",")).collect() print("type_of_list_of_list===========",type(list_of_list), list_of_list)
We have the following port open for both the servers.-
8080 - Master WEB UI Port
7077 - Port for connecting Master and Workers
8888 - Executor Port
8787 - Driver Port
4041-4090 - jvm port
Any suggestion regarding this WARN message?