Support Questions

Find answers, ask questions, and share your expertise

Error while connecting to hive from Spark using jdbc connection string

avatar
Contributor

I am trying to connect to Hive from Spark inside the map function like below

String driver = "org.apache.hive.jdbc.HiveDriver"; Class.forName(driver); Connection con_con1 = DriverManager .getConnection( "jdbc:hive2://server1.net:10001/default;principal=hive/server1.net@abc.xyz.NET;ssl=false;transportMode=http;httpPath=cliservice" , "username", "password");

But i am getting error.

javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

1 ACCEPTED SOLUTION

avatar
Contributor

Ok I have resolved the kerberos ticket issue by using below line in my java code.

System.setProperty("java.security.auth.login.config","gss-jaas.conf");

System.setProperty("sun.security.jgss.debug","true");

System.setProperty("javax.security.auth.useSubjectCredsOnly","false");

System.setProperty("java.security.krb5.conf","krb5.conf");

But I am getting different error now.

WARN scheduler.TaskSetManager: Lost task 1.0 in stage 1.0 (TID 7, SGSCAI0068.inedc.corpintra.net): java.sql.SQLException: Could not open connection to jdbc:hive2://**********.inedc.corpintra.net:10001/default;principal=hive/******.inedc.corpintra.net@*****;transportMode=http;httpPath=cliservice: null at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:206) at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:178) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:233) at GetDataFromXMLStax.returnXmlTagMatchStatus(GetDataFromXMLStax.java:77) at SP_LogFileNotifier_FileParsingAndCreateEmailContent$2.call(SP_LogFileNotifier_FileParsingAndCreateEmailContent.java:200) at SP_LogFileNotifier_FileParsingAndCreateEmailContent$2.call(SP_LogFileNotifier_FileParsingAndCreateEmailContent.java:1) at org.apache.spark.api.java.JavaPairRDD$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1027) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1109) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13$anonfun$apply$6.apply(PairRDDFunctions.scala:1108) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13$anonfun$apply$6.apply(PairRDDFunctions.scala:1108) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13.apply(PairRDDFunctions.scala:1116) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13.apply(PairRDDFunctions.scala:1095) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:258) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:203) ... 22 more

Seems the connection string is not correct. I am calling this connection string from inside of a spark program.

Connection con_con1 = DriverManager .getConnection( "jdbc:hive2://**********.inedc.corpintra.net:10001/default;principal=hive/**********.inedc.corpintra.net@*****.NET;transportMode=http;httpPath=cliservice");

Please help me with the correct connection string.

View solution in original post

7 REPLIES 7

avatar
Rising Star

Looks like the credential issue. Check the credentials entered are correct.

avatar
Contributor

The credentials are correct. I am using the same connection string from edge node and it works fine, but not from the spark program.

avatar
Super Collaborator

I don't know if this applies to your situation, but there is a Java issue that will cause that type of exception with Kerberos 1.8.1 or higher. If your error is related to this, for a renewable ticket run kinit -R after initially running kinit. There are some additional suggestions at https://community.hortonworks.com/articles/4755/common-kerberos-errors-and-solutions.html.

avatar
Rising Star

Your cluster is kerberized and this issue looks like kerberos ticket issue.

you dont have a valid kerberos ticket. to run this command

first get a valid ticket for your user to run this commad.

To get a valid ticket type

#kinit

then it will ask for kerberos password.

after you enter password

#klist

it will display kerberos ticket for your user.

now try running command to connect to hive.

avatar
Contributor

Ok I have resolved the kerberos ticket issue by using below line in my java code.

System.setProperty("java.security.auth.login.config","gss-jaas.conf");

System.setProperty("sun.security.jgss.debug","true");

System.setProperty("javax.security.auth.useSubjectCredsOnly","false");

System.setProperty("java.security.krb5.conf","krb5.conf");

But I am getting different error now.

WARN scheduler.TaskSetManager: Lost task 1.0 in stage 1.0 (TID 7, SGSCAI0068.inedc.corpintra.net): java.sql.SQLException: Could not open connection to jdbc:hive2://**********.inedc.corpintra.net:10001/default;principal=hive/******.inedc.corpintra.net@*****;transportMode=http;httpPath=cliservice: null at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:206) at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:178) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:233) at GetDataFromXMLStax.returnXmlTagMatchStatus(GetDataFromXMLStax.java:77) at SP_LogFileNotifier_FileParsingAndCreateEmailContent$2.call(SP_LogFileNotifier_FileParsingAndCreateEmailContent.java:200) at SP_LogFileNotifier_FileParsingAndCreateEmailContent$2.call(SP_LogFileNotifier_FileParsingAndCreateEmailContent.java:1) at org.apache.spark.api.java.JavaPairRDD$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1027) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1109) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13$anonfun$apply$6.apply(PairRDDFunctions.scala:1108) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13$anonfun$apply$6.apply(PairRDDFunctions.scala:1108) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13.apply(PairRDDFunctions.scala:1116) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13.apply(PairRDDFunctions.scala:1095) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:258) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:203) ... 22 more

Seems the connection string is not correct. I am calling this connection string from inside of a spark program.

Connection con_con1 = DriverManager .getConnection( "jdbc:hive2://**********.inedc.corpintra.net:10001/default;principal=hive/**********.inedc.corpintra.net@*****.NET;transportMode=http;httpPath=cliservice");

Please help me with the correct connection string.

avatar
Master Guru

Why connect to hive via JDBC? Use SparkSQL with the HiveContext and have full access to all the HiveTables. This is optimized in spark and very fast.

https://spark.apache.org/docs/1.6.0/sql-programming-guide.html

avatar
Contributor

Yes I agree but I am trying to run hive query inside the map method. While using hive context, I am getting error that class is not serializable.