Created 09-14-2016 08:08 AM
I am trying to connect to Hive from Spark inside the map function like below
String driver = "org.apache.hive.jdbc.HiveDriver"; Class.forName(driver); Connection con_con1 = DriverManager .getConnection( "jdbc:hive2://server1.net:10001/default;principal=hive/server1.net@abc.xyz.NET;ssl=false;transportMode=http;httpPath=cliservice" , "username", "password");
But i am getting error.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
Created 09-15-2016 08:19 AM
Ok I have resolved the kerberos ticket issue by using below line in my java code.
System.setProperty("java.security.auth.login.config","gss-jaas.conf");
System.setProperty("sun.security.jgss.debug","true");
System.setProperty("javax.security.auth.useSubjectCredsOnly","false");
System.setProperty("java.security.krb5.conf","krb5.conf");
But I am getting different error now.
WARN scheduler.TaskSetManager: Lost task 1.0 in stage 1.0 (TID 7, SGSCAI0068.inedc.corpintra.net): java.sql.SQLException: Could not open connection to jdbc:hive2://**********.inedc.corpintra.net:10001/default;principal=hive/******.inedc.corpintra.net@*****;transportMode=http;httpPath=cliservice: null at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:206) at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:178) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:233) at GetDataFromXMLStax.returnXmlTagMatchStatus(GetDataFromXMLStax.java:77) at SP_LogFileNotifier_FileParsingAndCreateEmailContent$2.call(SP_LogFileNotifier_FileParsingAndCreateEmailContent.java:200) at SP_LogFileNotifier_FileParsingAndCreateEmailContent$2.call(SP_LogFileNotifier_FileParsingAndCreateEmailContent.java:1) at org.apache.spark.api.java.JavaPairRDD$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1027) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1109) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13$anonfun$apply$6.apply(PairRDDFunctions.scala:1108) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13$anonfun$apply$6.apply(PairRDDFunctions.scala:1108) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13.apply(PairRDDFunctions.scala:1116) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13.apply(PairRDDFunctions.scala:1095) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:258) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:203) ... 22 more
Seems the connection string is not correct. I am calling this connection string from inside of a spark program.
Connection con_con1 = DriverManager .getConnection( "jdbc:hive2://**********.inedc.corpintra.net:10001/default;principal=hive/**********.inedc.corpintra.net@*****.NET;transportMode=http;httpPath=cliservice");
Please help me with the correct connection string.
Created 09-14-2016 08:19 AM
Looks like the credential issue. Check the credentials entered are correct.
Created 09-14-2016 08:27 AM
The credentials are correct. I am using the same connection string from edge node and it works fine, but not from the spark program.
Created 09-14-2016 10:59 PM
I don't know if this applies to your situation, but there is a Java issue that will cause that type of exception with Kerberos 1.8.1 or higher. If your error is related to this, for a renewable ticket run kinit -R after initially running kinit. There are some additional suggestions at https://community.hortonworks.com/articles/4755/common-kerberos-errors-and-solutions.html.
Created 09-15-2016 08:09 AM
Your cluster is kerberized and this issue looks like kerberos ticket issue.
you dont have a valid kerberos ticket. to run this command
first get a valid ticket for your user to run this commad.
To get a valid ticket type
#kinit
then it will ask for kerberos password.
after you enter password
#klist
it will display kerberos ticket for your user.
now try running command to connect to hive.
Created 09-15-2016 08:19 AM
Ok I have resolved the kerberos ticket issue by using below line in my java code.
System.setProperty("java.security.auth.login.config","gss-jaas.conf");
System.setProperty("sun.security.jgss.debug","true");
System.setProperty("javax.security.auth.useSubjectCredsOnly","false");
System.setProperty("java.security.krb5.conf","krb5.conf");
But I am getting different error now.
WARN scheduler.TaskSetManager: Lost task 1.0 in stage 1.0 (TID 7, SGSCAI0068.inedc.corpintra.net): java.sql.SQLException: Could not open connection to jdbc:hive2://**********.inedc.corpintra.net:10001/default;principal=hive/******.inedc.corpintra.net@*****;transportMode=http;httpPath=cliservice: null at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:206) at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:178) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:233) at GetDataFromXMLStax.returnXmlTagMatchStatus(GetDataFromXMLStax.java:77) at SP_LogFileNotifier_FileParsingAndCreateEmailContent$2.call(SP_LogFileNotifier_FileParsingAndCreateEmailContent.java:200) at SP_LogFileNotifier_FileParsingAndCreateEmailContent$2.call(SP_LogFileNotifier_FileParsingAndCreateEmailContent.java:1) at org.apache.spark.api.java.JavaPairRDD$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1027) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1109) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13$anonfun$apply$6.apply(PairRDDFunctions.scala:1108) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13$anonfun$apply$6.apply(PairRDDFunctions.scala:1108) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13.apply(PairRDDFunctions.scala:1116) at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13.apply(PairRDDFunctions.scala:1095) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:258) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:203) ... 22 more
Seems the connection string is not correct. I am calling this connection string from inside of a spark program.
Connection con_con1 = DriverManager .getConnection( "jdbc:hive2://**********.inedc.corpintra.net:10001/default;principal=hive/**********.inedc.corpintra.net@*****.NET;transportMode=http;httpPath=cliservice");
Please help me with the correct connection string.
Created 09-15-2016 12:18 PM
Why connect to hive via JDBC? Use SparkSQL with the HiveContext and have full access to all the HiveTables. This is optimized in spark and very fast.
https://spark.apache.org/docs/1.6.0/sql-programming-guide.html
Created 09-15-2016 01:49 PM
Yes I agree but I am trying to run hive query inside the map method. While using hive context, I am getting error that class is not serializable.