Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark-YARN connectivity in different machines


Spark-YARN connectivity in different machines

New Contributor

Hi all,

 We have a Spark installed in one machine (Machine 1) and a yarn cluster configured in another machine (Machine 2). How can we configure spark in machine 1 to communicate and receive data from yarn cluster in machine 2?


 In our case we are experimenting with a client installed in machine 1 with a flag "yarn-client" to the underlying Spark, which needs to pick up data from Yarn-cluster in machine 2. How can Spark in machine 1 be configured to do so?


Re: Spark-YARN connectivity in different machines

Master Collaborator
This is a network config question, not CDH or Spark. They can
communicate if there is a path between them.
You normally assign any node you want to run Spark from as a Spark
gateway node (you don't otherwise "install Spark" per se). This can be
separate from the nodes in the YARN node/resource manager roles, but
usually is *not* separate. That is, making the driver remote is not
necessarily a good idea.

Re: Spark-YARN connectivity in different machines

New Contributor

We're new to Spark and are executing the following for kerberos in local host machine where a Spark client is installed to connect to spark cluster in destination host machine -


1. Generate keytab -

xst -k spark.keytab spark/<kerberos principal>


2. Copy spark keytab to spark/conf directory -

cp spark.keytab /install/spark-1.5.0-bin-hadoop2.6/conf/

3. Add the following in –
SPARK_HISTORY_OPTS=-Dspark.history.kerberos.enabled=true \ -Dspark.history.kerberos.principal=<kerberos principal>@<realm> \ -Dspark.history.kerberos.keytab=/install/spark-1.5.0-bin-hadoop2.6/conf/spark.keytab


4. Check -
spark-shell --master yarn-client
/install/spark-1.5.0-bin-hadoop2.6/conf/ line 47:  -Dspark.history.kerberos.principal=<kerberos principal>@<realm>: command not found Failed on local exception: java.lang.IllegalArgumentException: Failed to specify server's Kerberos principal name; Host Details : local host is: "xxxx/"; destination host is: "xxxx":8020;



We have 2 questions -


1. Is there some configuration to make spark recognize Dspark.history.kerberos.principal?

2. Are we missing any other configuration as a result of which the authentication process from Spark host to destination yarn cluster host is not working?


We took reference of the following -


Don't have an account?
Coming from Hortonworks? Activate your account here