We have a Spark installed in one machine (Machine 1) and a yarn cluster configured in another machine (Machine 2). How can we configure spark in machine 1 to communicate and receive data from yarn cluster in machine 2?
In our case we are experimenting with a client installed in machine 1 with a flag "yarn-client" to the underlying Spark, which needs to pick up data from Yarn-cluster in machine 2. How can Spark in machine 1 be configured to do so?
We're new to Spark and are executing the following for kerberos in local host machine where a Spark client is installed to connect to spark cluster in destination host machine -
1. Generate keytab -
xst -k spark.keytab spark/<kerberos principal>
2. Copy spark keytab to spark/conf directory -
cp spark.keytab /install/spark-1.5.0-bin-hadoop2.6/conf/
3. Add the following in spark-env.sh –
SPARK_HISTORY_OPTS=-Dspark.history.kerberos.enabled=true \ -Dspark.history.kerberos.principal=<kerberos principal>@<realm> \ -Dspark.history.kerberos.keytab=/install/spark-1.5.0-bin-hadoop2.6/conf/spark.keytab
4. Check spark-env.sh -
spark-shell --master yarn-client
/install/spark-1.5.0-bin-hadoop2.6/conf/spark-env.sh: line 47: -Dspark.history.kerberos.principal=<kerberos principal>@<realm>: command not found
java.io.IOException: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Failed to specify server's Kerberos principal name; Host Details : local host is: "xxxx/xxx.xx.xxx.xx"; destination host is: "xxxx":8020;
We have 2 questions -
1. Is there some configuration to make spark recognize Dspark.history.kerberos.principal?
2. Are we missing any other configuration as a result of which the authentication process from Spark host to destination yarn cluster host is not working?
We took reference of the following -