Support Questions

Find answers, ask questions, and share your expertise

Is it impossible to run a Spark2 job with --master yarn, in an external node to the cluster?


Hi everyone... once again I come to this community forum in despair. Let me explain.


Our customer is trying to run Spark 2.2.0 on an external node, that doesn't belong to the Cloudera Cluster.


This Cloudera Cluster, CDH 5.15.1, has Spark on YARN (1.6.0) and Spark 2.6.0.


The problem is that running a simple wordcount with the Spark2.2.0 in the external node with --master yarn property, starts on Spark 1.6.0 in the Cluster...


I've made multiple test with no sucess... I'm starting to think that the only way to run Spark2 is inside a cluster node...


Any ideas will be helpful since I don't know what to do at this point to help the customer working with Spark2... (last resort is giving them direct access to the CDH cluster nodes... but we don't want that for security reasons...)




EDIT1: Found this


It seems cloudera has already something to force Spark2 only on the cluster. Anyone tried this?



From the URL


only blocking point is that you cannot use principal and keytab configuration with spark2.But definitely there could be more in it. I am also struggling to run pyspark2 using spark2-submit.




looks like to run on client mode yoy need to set --master local in your spark2-submit command


Sure but this is not the objective (local mode in an external machine) when you have a 6 cluster node....

In this case... we need to run on --master yarn.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.