- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Loading impala table into spark throws error
- Labels:
-
Apache Impala
-
Apache Spark
Created on ‎03-08-2019 03:54 AM - edited ‎09-16-2022 07:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Team,
We have CDH 5.15 with kerberos enabled cluster.
We trying to load Impala table into CDH and performed below steps, but while showing the results it throws HSS initiate failed error.
Kindly suggest?
-bash-4.2$ spark2-shell --master yarn --deploy-mode client --driver-class-path ImpalaJDBC41.jar --jars ImpalaJDBC41.jar
scala> val jdbcURL = s"jdbc:impala://host1:21050/external;AuthMech=1;KrbRealm=XYZ;KrbHostFQDN=host1;KrbServiceName=impala"
scala> val connectionProperties = new java.util.Properties()
connectionProperties: java.util.Properties = {}
scala> val hbaseDF = spark.sqlContext.read.jdbc(jdbcURL, "external.Names_text", connectionProperties)
hbaseDF: org.apache.spark.sql.DataFrame = [employeeid: int, firstname: string ... 3 more fields]
scala> hbaseDF.show
[Stage 0:> (0 + 1) / 1]19/03/08 07:11:46 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, a301-8530-3309.ldn.swissbank.com, executor 1): java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500164) Error initialized or created transport for authentication: [Cloudera][ImpalaJDBCDriver](500169) Unable to connect to server: GSS initiate failed.
at com.cloudera.impala.hivecommon.api.HiveServer2ClientFactory.createTransport(Unknown Source)
at com.cloudera.impala.hivecommon.api.HiveServer2ClientFactory.createClient(Unknown Source)
at com.cloudera.impala.hivecommon.core.HiveJDBCCommonConnection.establishConnection(Unknown Source)
at com.cloudera.impala.impala.core.ImpalaJDBCConnection.establishConnection(Unknown Source)
at com.cloudera.impala.jdbc.core.LoginTimeoutConnection.connect(Unknown Source)
at com.cloudera.impala.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source)
at com.cloudera.impala.jdbc.common.AbstractDriver.connect(Unknown Source)
at org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:63)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:271)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Caused by: com.cloudera.impala.support.exceptions.GeneralException: [Cloudera][ImpalaJDBCDriver](500164) Error initialized or created transport for authentication: [Cloudera][ImpalaJDBCDriver](500169) Unable to connect to server: GSS initiate failed.
... 27 more
Kindly help/suggest what i did wrong?
Created ‎03-13-2019 06:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Vijay,
Please see [1]. This use case isn't supported.
However, shared error suggests that executor isn't able to connect to Impala daemon due to authenitcation issues. This is because executor is running in a separate JVM and should acquire Kerberos TGT as well.
In order to perform this, you could make use of jaas configuration, see [2] and search for "To set up the JAAS login configuration file" (page 15). Once you have a tested Jaas login configuration and a keytab file, you could pass it as follows to the executors.
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf -Djavax.security.auth.useSubjectCredsOnly=false" \
--conf "spark.yarn.dist.files=<path_to_keytab>.keytab,<path_to_keytab>/jaas.conf"
Alternatively, if your Impala can authenticate using LDAP, you could also test using it.
Hope this helps!
Thanks,
Sudarshan
