Reply
New Contributor
Posts: 2
Registered: ‎03-14-2016

Spark-YARN connectivity in different machines

Hi all,

 We have a Spark installed in one machine (Machine 1) and a yarn cluster configured in another machine (Machine 2). How can we configure spark in machine 1 to communicate and receive data from yarn cluster in machine 2?

 

 In our case we are experimenting with a client installed in machine 1 with a flag "yarn-client" to the underlying Spark, which needs to pick up data from Yarn-cluster in machine 2. How can Spark in machine 1 be configured to do so?

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Spark-YARN connectivity in different machines

This is a network config question, not CDH or Spark. They can
communicate if there is a path between them.
You normally assign any node you want to run Spark from as a Spark
gateway node (you don't otherwise "install Spark" per se). This can be
separate from the nodes in the YARN node/resource manager roles, but
usually is *not* separate. That is, making the driver remote is not
necessarily a good idea.
Highlighted
New Contributor
Posts: 2
Registered: ‎03-14-2016

Re: Spark-YARN connectivity in different machines

We're new to Spark and are executing the following for kerberos in local host machine where a Spark client is installed to connect to spark cluster in destination host machine -

 

1. Generate keytab -

xst -k spark.keytab spark/<kerberos principal>

 

2. Copy spark keytab to spark/conf directory -

cp spark.keytab /install/spark-1.5.0-bin-hadoop2.6/conf/


3. Add the following in spark-env.sh –
SPARK_HISTORY_OPTS=-Dspark.history.kerberos.enabled=true \ -Dspark.history.kerberos.principal=<kerberos principal>@<realm> \ -Dspark.history.kerberos.keytab=/install/spark-1.5.0-bin-hadoop2.6/conf/spark.keytab

 

4. Check spark-env.sh -
spark-shell --master yarn-client
/install/spark-1.5.0-bin-hadoop2.6/conf/spark-env.sh: line 47:  -Dspark.history.kerberos.principal=<kerberos principal>@<realm>: command not found

 

java.io.IOException: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Failed to specify server's Kerberos principal name; Host Details : local host is: "xxxx/xxx.xx.xxx.xx"; destination host is: "xxxx":8020;

 

 

We have 2 questions -

 

1. Is there some configuration to make spark recognize Dspark.history.kerberos.principal?

2. Are we missing any other configuration as a result of which the authentication process from Spark host to destination yarn cluster host is not working?

 

We took reference of the following -

 

http://www.cloudera.com/documentation/enterprise/5-4-x/topics/sg_spark_auth.html