Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark & Impala Exception in Yarn Cluster

Re: Spark

Explorer

Any news or what is your current workaround to connect spark with impala? 

Re: Spark & Impala Exception in Yarn Cluster

Explorer

Hi  S

Re: Spark & Impala Exception in Yarn Cluster

New Contributor

We are having the same problem.

Re: Spark & Impala Exception in Yarn Cluster

 
Highlighted

Re: Spark & Impala Exception in Yarn Cluster

After investing some time into this I am finally able to connect to Kerberized Impala environment using Spark and Impala JDBC. I have uploaded my code to github. You can check it out and get hints from here :-

https://github.com/morfious902002/impala-spark-jdbc-kerberos

Re: Spark & Impala Exception in Yarn Cluster

New Contributor

In Spark 1.6, I had the same issue and resolved by creating the connection as below code:

 

protected def getImpalaConnection(impalaJdbcDriver: String, impalaJdbcUrl: String): Connection = {
if (impalaJdbcDriver.length() == 0) return null
try {
Class.forName(impalaJdbcDriver).newInstance
UserGroupInformation.getLoginUser.doAs(
new PrivilegedAction[Connection] {
override def run(): Connection = DriverManager.getConnection(impalaJdbcUrl)
}
)
} catch {
case se: SQLException => println(se.printStackTrace())
throw se
case e: Exception => println(e.printStackTrace())
throw e
}
}

 

But same code failed in Spark2, still trying to resolve the issue.

Re: Spark & Impala Exception in Yarn Cluster

New Contributor

can you provide solution for this in Spark2+? Following code working in Spark 1.6 but not in Spark 2.3.0

 

Class.forName(impalaJdbcDriver).newInstance
    UserGroupInformation.getLoginUser.doAs(
      new PrivilegedAction[Connection] {
        override def run(): Connection = DriverManager.getConnection(impalaJdbcUrl)

We are getting following exception

 

User class threw exception: java.security.PrivilegedActionException: java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://XXx:21050/;principal=impala/XXXX: GSS initiate failed
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:694)
Caused by: java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://XXX:21050/;principal=impala/XXXX@XXX.com: GSS initiate failed
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:231)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:176)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
.....
Caused by: org.apache.thrift.transport.TTransportException: GSS initiate failed

Thanks