Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark sql with impala on kerberos returning only column names

Highlighted

Spark sql with impala on kerberos returning only column names

New Contributor

 

Hi, I'm using impala driver to execute queries in spark and encountered following problem.  Any suggestion would be appreciated.

 

sparkVersion = 2.2.0

impalaJdbcVersion = 2.6.3

 

Before moving to kerberos hadoop cluster, executing join sql and loading into spark are working fine.

 

spark.read.format("jdbc").option("url", "jdbc:impala://{host}:21050;").option("driver", "com.cloudera.impala.jdbc41.Driver").option("dbtable", "(SELECT s.id as student_id, sc.id as class_id FROM student s JOIN student_to_class sc where sc.id = s.class_id) t").load().createOrReplaceTempView("studentClass")

spark.sql("select * from studentClass").show
+----------+--------+
|student_id|class_id|
+----------+--------+
|123       |111     |
|234       |111     |
|456       |111     |
+----------+--------+

 

After moved to Kerberos hadoop cluster, loading join query in spark return only column names (number of rows are still correct).  Loading individual table and run sql on those tables in spark are still working correctly. 

 

spark.read.format("jdbc").option("url", "jdbc:impala://{host}:21050;AuthMech=1;KrbHostFQDN={Krbhost};KrbRealm=TEST-DATA.COM;KrbServiceName=impala").option("driver", "com.cloudera.impala.jdbc41.Driver").option("dbtable", "(SELECT s.id as student_id, sc.id as class_id FROM student s JOIN student_to_class sc where sc.id = s.class_id) t").load().createOrReplaceTempView("studentClass")

spark.sql("select * from studentClass").show
+----------+--------+
|student_id|class_id|
+----------+--------+
|student_id|class_id|
|student_id|class_id|
|student_id|class_id|
+----------+--------+

All the queries are working and return correct data in Impala-shell and Hue.

 

Since we won't be able to know all the tables needed before the spark job, being able to load join query into a table is needed for our task.

 

I've tried switching different version of Impala driver, but it didn't fix the problem.  

 

 

3 REPLIES 3

Re: Spark sql with impala on kerberos returning only column names

New Contributor
I am also facing the same problem when I am using analytical function in SQL. It worked fine with resulset but not in spark.

Re: Spark sql with impala on kerberos returning only column names

New Contributor

You need to load up the Simba Driver in ImpalaJDBC41.jar - available here - https://www.cloudera.com/downloads/connectors/impala/jdbc/2-6-12.html

Re: Spark sql with impala on kerberos returning only column names

Guru
Running Impala query over driver from Spark is not currently supported by Cloudera. Why don't you just use SparkSQL instead? Why need to have extra layer of impala here?

Cheers
Eric
Don't have an account?
Coming from Hortonworks? Activate your account here