Support Questions

ed_day · ‎11-24-2017

Hi.

I am running Spark2 from Zeppelin (0.7 in HDP 2.6) and I am doing an idf transformation which crashes after many hours. It is run on a cluster with a master and 3 datanodes: s1, s2 and s3. All nodes have a Spark2 client and each has 8 cores and 16GB RAM.

I just noticed it is only running on one node s3 with 5 executors.

In zeppelin-env.sh I have set zeppelin.executor.instances to 32 and zeppelin.executor.mem to 12g and it has the line:

export MASTER=yarn-client

I have set yarn.resourcemanager.scheduler.class to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.

I also set spark.executor.instances to 32 in the Spark2 interprter.

Anyone have any ideas what else I can try to get the other nodes doing their share?

ed_day · ‎11-24-2017

The answer is because I am an idiot. Only S3 had datanode and nodemanager installed. Hopefully this might help someone.

View solution in original post

dperez · ‎11-24-2017

Hello

This seems to be happening because your spark have it configured to use master = [local]

1) Take a look at the link below:

https://zeppelin.apache.org/docs/latest/manual/interpreters.html#what-is-interpreter-group

2) Try to change from (master) local to yarn-client if you still have it on your interpreter.

3) If your application shows up in the Resource Manager, it's likely that it is using the yarn framework.

Regards,

ed_day · ‎11-24-2017

Thanks Danilo but it is set to

yarn-client

ed_day · ‎11-24-2017

The answer is because I am an idiot. Only S3 had datanode and nodemanager installed. Hopefully this might help someone.

Cloudera Community

Support Questions

Why is Spark2 running on only one node?