Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar

Scenario 1: Only one instance of Spark Thrift Server is needed

Approach:

If you are installing the Spark Thrift Server on a Kerberos-secured cluster, the following instructions apply:

  • The Spark Thrift Server must run in the same host as HiveServer2, so that it can access the hiveserver2 keytab.
  • Edit permissions in /var/run/spark and /var/log/spark to specify read/write permissions to the Hive service account. /var/run/spark and /var/log/spark should be able read/write to hive. So, just seeing contents as user hive is not enough, you need to be able to write to those folders. One way is to give 77x permissions on these folders. Since spark:hadoop is owner:group and hive belongs to group hadoop, it will have write access with this setup.
  • Use the Hive service account to start the thriftserver process.

It is recommend that you run the Spark Thrift Server as user hive instead of user spark. This ensures that the Spark Thrift Server can access Hive keytabs, the Hive metastore, and data in HDFS that is stored under user hive.

When the Spark Thrift Server runs queries as user hive, all data accessible to user hive will be accessible to the user submitting the query. For a more secure configuration, use a different service account for the Spark Thrift Server. Provide appropriate access to the Hive keytabs and the Hive metastore.

If you still do not want to install the STS on the same host as HiveServer2 for some reason, then you must follow the below approach.

Scenario 2 : Install multiple Spark Thrift Server instances on hosts other than HiveServer2

Approach :

Run all commands as the root user.

  1. Back up hive.service.keytab in /etc/security/keytabs on Hive Server host by making a copy of the file and move the copy to a different directory than /etc/security/keytabs.
  2. If Spark Thrift Server host also has hive.service.keytab in/etc/security/keytabs, make a copy of the file and move the copy to a different directory than /etc/security/keytabs.
  3. On the Ambari Server node, run the following command from the command line to obtain and cache Kerberos ticket-granting tickets.

    kinit [admin principal]

    Type in the admin principal password when asked. The admin principal name and the admin principal password are the ones used to enable Kerberos via Ambari. For example: If the admin principal used to enable Kerberos was root/admin and corresponding password was abc123, run kinit root/adminand type abc123 when prompted for password by the command line.

  4. On the Ambari Server node, in a temporary directory, run the following command to open kadmin shell

    kadmin

    1. Add a new principal as hive/[spark_thrift_server_host]@[Kerberos realm]. Replace [spark_thrift_server_host] with the host name of the Spark Thrift Server on the cluster. Replace [Kerberos realm] with the Kerberos realm used when enabling Kerberos in Ambari. For example, if Kerberos is enabled in Ambari with Kerberos realm MyDomain.COM, use it to replace [Kerberos realm].

      addprinc -randkey hive/[spark_thrift_serverhost]@[Kerberos realm]

    2. Add all Hive principals to the Hive service keytab file. This should include the existing one for the Hive Server host and the one created in the previous step.
      ktadd -k hive.service.keytab 
      hive/[spark_thrift_server_host]@[Kerberos realm]
      ktadd -k hive.service.keytab 
      hive/[hive_server_host]@[Kerberos realm]

      Replace [spark_thrift_server_host], [hive_server_host]and [Kerberos realm] with information specifically for the cluster.

      kadmin shell should print out messages indicating the principal is added to the file.

      For example:

      kadmin:  ktadd -k hive.service.keytab 
      hive/myserver1.mydomain.com@MyDomain.COM        
      Entry for principal hive/ myserver1.mydomain.com@MyDomain.COM with kvno 3,        encryption type aes256-cts-hmac-sha1-96 added to keytab      
      WRFILE:hive.service.keytab.Entry for principal hive/ myserver1.mydomain.com@MyDomain.COM with kvno 3, encryption type        aes128-cts-hmac-sha1-96 added to keytab    WRFILE:hive.service.keytab.
    3. Type exit to exit the kadmin shell.
  5. Find the newly generated hive.service.keytab in the current directory location.
    1. Add it to /etc/security/keytabs on Spark Thrift Server host.
    2. Use it to replace /etc/security/keytabs Hive Server host.
    3. Update permission and ownership of the file on both Spark Thrift Server host and Hive Server host as shown below.
      chmod 400 hive.service.keytab
      chown 
      [hive_user]:[hive_user_primary_group]hive.service.keytab
  6. Stop all Spark components via Ambari web UI. Ensure there are no running Spark processes on the Spark component hosts.
  7. Restart Hive from Ambari UI.
  8. Start Spark Service from Ambari UI.
2,863 Views