Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

All the executors on clusters are not running

All the executors on clusters are not running

New Contributor

Hi,

  I have a lab environment of cdh5 with 6 nodes-node[1-6] and node7 as the nameNode. 

node[1-5]: 8gb ram, 2 cores

node[6]: 32gb ram, 8 cores

I am new to spark and I am trying to simply count the number of lines in our data. I have uploaded the data on hdfs (5.3GB).

When I submit my spark job, it only runs 2 executors and I can see its splitting the task into 161 task (there are 161 files in the dir).

 

In the code,  I am reading all the files and doing the count on them.

data_raw = sc.textFile(path)
print data_raw.count()

 

On CLI: spark-submit --master yarn-client file_name.py --num-executors 6 --executor-cores 1

 

It should run with 6 executors with 1 task running on them. But I only see 2 executors running. I am not able to figure the cause for it.

 

Any help would be greatly appreciated.

 

Thanks

mandeep