Support Questions
Find answers, ask questions, and share your expertise

How to use Apache Spark to query Hive table with Kerberos?


How to use Apache Spark to query Hive table with Kerberos?

New Contributor

I am attempting to use Scala with Apache Spark locally to query Hive table which is secured with Kerberos. I have no issues connecting and querying the data programmatically without Spark. However, the problem comes when I try to connect and query in Spark.


My code when run locally without spark:


    System.setProperty("kerberos.keytab", keytab)
    System.setProperty("kerberos.principal", keytab)
    System.setProperty("", krb5.conf)
    System.setProperty("", jaas.conf)
    val conf = new Configuration
    conf.set("", "Kerberos")
    UserGroupInformation.createProxyUser("user", UserGroupInformation.getLoginUser)
    UserGroupInformation.loginUserFromKeytab(user, keytab)
    if (UserGroupInformation.isLoginKeytabBased) {
    else if (UserGroupInformation.isLoginTicketBased) UserGroupInformation.getLoginUser.reloginFromTicketCache()
    val con = DriverManager.getConnection("jdbc:hive://", user, password)
    val ps = con.prepareStatement("select * from table limit 5").executeQuery();



Does anyone know how I could include the keytab, krb5.conf and jaas.conf into my Spark initialization function so that I am able to authenticate with Kerberos to get the TGT?


My Spark initialization function:


conf = new SparkConf().setAppName("mediumData")
      .set("", "localhost")
      .set("spark.ui.enabled","true") //enable spark UI
    sparkSession = SparkSession.builder.config(conf).enableHiveSupport().getOrCreate()



I do not have files such as hive-site.xml, core-site.xml.

Thank you!


Don't have an account?