Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Is it impossible to run a Spark2 job with --master yarn, in an external node to the cluster?

Is it impossible to run a Spark2 job with --master yarn, in an external node to the cluster?

Explorer

Hi everyone... once again I come to this community forum in despair. Let me explain.

 

Our customer is trying to run Spark 2.2.0 on an external node, that doesn't belong to the Cloudera Cluster.

 

This Cloudera Cluster, CDH 5.15.1, has Spark on YARN (1.6.0) and Spark 2.6.0.

 

The problem is that running a simple wordcount with the Spark2.2.0 in the external node with --master yarn property, starts on Spark 1.6.0 in the Cluster...

 

I've made multiple test with no sucess... I'm starting to think that the only way to run Spark2 is inside a cluster node...

 

Any ideas will be helpful since I don't know what to do at this point to help the customer working with Spark2... (last resort is giving them direct access to the CDH cluster nodes... but we don't want that for security reasons...)

 

Thanks...

 

EDIT1: Found this https://www.cloudera.com/documentation/spark2/latest/topics/spark2_admin.html#default_tools

 

It seems cloudera has already something to force Spark2 only on the cluster. Anyone tried this?

3 REPLIES 3

Re: Is it impossible to run a Spark2 job with --master yarn, in an external node to the cluster?

Explorer

From the URL https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html#ki_spark_submit...

 

only blocking point is that you cannot use principal and keytab configuration with spark2.But definitely there could be more in it. I am also struggling to run pyspark2 using spark2-submit.

 

 

Highlighted

Re: Is it impossible to run a Spark2 job with --master yarn, in an external node to the cluster?

Explorer

looks like to run on client mode yoy need to set --master local in your spark2-submit command

Re: Is it impossible to run a Spark2 job with --master yarn, in an external node to the cluster?

Explorer

Sure but this is not the objective (local mode in an external machine) when you have a 6 cluster node....

In this case... we need to run on --master yarn.