04-30-2019 01:02 AM - edited 04-30-2019 01:58 AM
Hi everyone... once again I come to this community forum in despair. Let me explain.
Our customer is trying to run Spark 2.2.0 on an external node, that doesn't belong to the Cloudera Cluster.
This Cloudera Cluster, CDH 5.15.1, has Spark on YARN (1.6.0) and Spark 2.6.0.
The problem is that running a simple wordcount with the Spark2.2.0 in the external node with --master yarn property, starts on Spark 1.6.0 in the Cluster...
I've made multiple test with no sucess... I'm starting to think that the only way to run Spark2 is inside a cluster node...
Any ideas will be helpful since I don't know what to do at this point to help the customer working with Spark2... (last resort is giving them direct access to the CDH cluster nodes... but we don't want that for security reasons...)
It seems cloudera has already something to force Spark2 only on the cluster. Anyone tried this?
04-30-2019 01:11 PM
only blocking point is that you cannot use principal and keytab configuration with spark2.But definitely there could be more in it. I am also struggling to run pyspark2 using spark2-submit.
05-02-2019 01:14 AM - edited 05-02-2019 01:27 AM
Sure but this is not the objective (local mode in an external machine) when you have a 6 cluster node....
In this case... we need to run on --master yarn.