Confusion in documentation : Configuring the Spark Thrift Server on a Kerberos-Enabled Cluster

smartninja723 — Wed, 18 May 2016 18:40:41 GMT

Guys,

I am referring the document : http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_spark-guide/content/spark-kerb-access-hive.html and got a bit confused. Wanted to check with experts who already configured spark thrift server on kerberized environment.

If you are installing the Spark Thrift Server on a Kerberos-secured cluster, note the following requirements:

The Spark Thrift Server must run in the same host as HiveServer2, so that it can access the hiveserver2 keytab.
- OK. Install and run Spark TS on the same host as HS2. Install STS using Ambari.
Edit permissions in /var/run/spark and /var/log/spark to specify read/write permissions to the Hive service account.
- Not very clear here. I see that in our cluster, we have a user spark. And I tried to do ls /var/run and ls /var/run/spark as spark user and as hive user (after su spark) I see the directory contents in both cases. Is it correct or am I supposed to something else because I didn't edit the permissions. What permissions to to be edited?
- ll /var/run
- drwxrwxr-x 3 spark hadoop 4096 May 17 10:47 spark
- ll /var/run/spark
- -rw-r--r-- 1 root root 6 May 17 11:18 spark-root-org.apache.spark.deploy.history.HistoryServer-1.pid
- ll /var/log/
- drwxr-xr-x 2 spark spark 4096 Mar 9 10:06 spark
- ll /var/log/spark
Use the Hive service account to start the thriftserver process.
- Does it mean, I got to do kinit with hive keytab or do su hive and start the thrift server.?

Thanks.

Re: Confusion in documentation : Configuring the Spark Thrift Server on a Kerberos-Enabled Cluster

ravi1 — Thu, 26 May 2016 13:24:39 GMT

You are going to use hive account to run spark thrift server. So, if it is a manual install, then

./sbin/start-thriftserver.sh --master yarn-client --executor-memory 512m --hiveconf hive.server2.thrift.port=10015

will be run as user hive (with su hive) instead of user spark in secure setup. Similarly /var/run/spark and /var/log/spark should be read/write to hive. So, just seeing contents as user hive is not enough, you need to be able to write to those folders. One good easy way is to give 77x permissions on these folders. Since spark:hadoop is owner:group and hive belongs to group hadoop, it will have write access with this setup.

question Re: Confusion in documentation : Configuring the Spark Thrift Server on a Kerberos-Enabled Cluster in Archives of Support Questions (Read Only)

Confusion in documentation : Configuring the Spark Thrift Server on a Kerberos-Enabled Cluster

Re: Confusion in documentation : Configuring the Spark Thrift Server on a Kerberos-Enabled Cluster