Support Questions

Find answers, ask questions, and share your expertise

Why is Spark2 running on only one node?

avatar
Expert Contributor

Hi.

I am running Spark2 from Zeppelin (0.7 in HDP 2.6) and I am doing an idf transformation which crashes after many hours. It is run on a cluster with a master and 3 datanodes: s1, s2 and s3. All nodes have a Spark2 client and each has 8 cores and 16GB RAM.

I just noticed it is only running on one node s3 with 5 executors.

In zeppelin-env.sh I have set zeppelin.executor.instances to 32 and zeppelin.executor.mem to 12g and it has the line:

export MASTER=yarn-client

I have set yarn.resourcemanager.scheduler.class to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.

I also set spark.executor.instances to 32 in the Spark2 interprter.

Anyone have any ideas what else I can try to get the other nodes doing their share?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

The answer is because I am an idiot. Only S3 had datanode and nodemanager installed. Hopefully this might help someone.

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

Hello

This seems to be happening because your spark have it configured to use master = [local]

1) Take a look at the link below:

https://zeppelin.apache.org/docs/latest/manual/interpreters.html#what-is-interpreter-group

2) Try to change from (master) local to yarn-client if you still have it on your interpreter.

3) If your application shows up in the Resource Manager, it's likely that it is using the yarn framework.

Regards,

avatar
Expert Contributor

Thanks Danilo but it is set to


yarn-client

avatar
Expert Contributor

The answer is because I am an idiot. Only S3 had datanode and nodemanager installed. Hopefully this might help someone.