Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Zeppelin %hive SQL Charts are not working with TEMP tables when using HiveThriftServer2 and hiveContext

avatar

I am running into an issue on HDP Sandbox 2.4 , Spark 1.6 and Zeppelin notebook where the "temp" registered tables are not being found when trying to use %hive

I am loading table:

val hiveTablesDF = hiveContext.read....
hiveTablesDF.registerTempTable("DimStoreDF")
hiveTablesDF.show()

Output:

hiveTablesDF: org.apache.spark.sql.DataFrame = [storekey: int, geographykey: int, storemanager: int, storetype: string, storename: string, storedescription: string, status: string, opendate: timestamp, closedate: timestamp, entitykey: int, zipcode: string, zipcodeextension: string, storephone: string, storefax: string, closereason: string, employeecount: int, sellingareasize: double, lastremodeldate: timestamp, etlloadid: int, somedate1: timestamp, somedate2: timestamp, loaddate: timestamp, updatedate: timestamp]I can also see the table "dimstoredf" using hiveContext.tableNames
hiveContext.tableNames

res27: Array[String] = Array(dimproductdf, dimstoredf, ae2, dimcustomer, dimcustomertemp, dimproduct, dimproducttemp, factonlinesales, factonlinesalestemp, factsales, factsalestemp, health_table, mysql_federated_sample, sample_07, sample_08)

It is also available in beeline and is registered as Temporary:

0: jdbc:hive2://localhost:10002/default> [root@sandbox ~]# beeline -u "jdbc:hive2://localhost:10002/default" -n admin
WARNING: Use "yarn jar" to launch YARN applications.
Connecting to jdbc:hive2://localhost:10002/default
Connected to: Spark SQL (version 1.6.0)
Driver: Hive JDBC (version 1.2.1000.2.4.0.0-169)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.4.0.0-169 by Apache Hive
0: jdbc:hive2://localhost:10002/default> show tables;
+-------------------------+--------------+--+
|        tableName        | isTemporary  |
+-------------------------+--------------+--+
| dimproductdf            | true         |
| dimstoredf              | true         |
| ae2                     | false        |
| dimcustomer             | false        |
| dimcustomertemp         | false        |
| dimproduct              | false        |
| dimproducttemp          | false        |
| factonlinesales         | false        |
| factonlinesalestemp     | false        |
| factsales               | false        |
| factsalestemp           | false        |
| health_table            | false        |
| mysql_federated_sample  | false        |
| sample_07               | false        |
| sample_08               | false        |
+-------------------------+--------------+--+

However when I try to run:

%hive

select * from DimStoreDF

I get this error:

Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'DimStoreDF'

If I run the sqlstatement from hiveContext.sql all works well.

hiveContext.sql("select * from dimstoredf").show()

------------+---------+--------------------+--------------------+--------------------+--------------------+

|storekey|geographykey|storemanager|storetype|           storename|    storedescription|status|            opendate|           closedate|entitykey|zipcode|zipcodeextension|  storephone|    storefax|closereason|employeecount|sellingareasize|     lastremodeldate|etlloadid|           somedate1|           somedate2|            loaddate|          updatedate|

+--------+------------+------------+---------+--------------------+--------------------+------+--------------------+--------------------+---------+-------+----------------+------------+------------+-----------+-------------+---------------+--------------------+---------+--------------------+--------------------+--------------------+--------------------+

Appreciate your input, suggestions?

1 ACCEPTED SOLUTION

avatar
Guru

@azeltov

I took a closer look at this. I think the issue is that calling the registerTempTable method on a SQL Context registers it in the application's SQL context. The %hive interpreter in Zeppelin only sees the tables registered in the Hive meta store and not the temp tables registered in the Spark application's SQL context. Beeline shows both temp tables and permanent tables registered in the Hive meta store because the Spark thrift server aggregates table meta data from Hive and from the Spark application's SQL context. This is similar to how the Zeppelin %sql interpreter is only able to see temp tables registered on the default SQLContext created by the Zeppelin session when it starts Spark. As we saw, if you create a second SQL context in the same Zeppelin session, any temp tables registered on that second context are not visible to the Zeppelin %sql interpreter.

View solution in original post

4 REPLIES 4

avatar
Guru

@azeltov

I took a closer look at this. I think the issue is that calling the registerTempTable method on a SQL Context registers it in the application's SQL context. The %hive interpreter in Zeppelin only sees the tables registered in the Hive meta store and not the temp tables registered in the Spark application's SQL context. Beeline shows both temp tables and permanent tables registered in the Hive meta store because the Spark thrift server aggregates table meta data from Hive and from the Spark application's SQL context. This is similar to how the Zeppelin %sql interpreter is only able to see temp tables registered on the default SQLContext created by the Zeppelin session when it starts Spark. As we saw, if you create a second SQL context in the same Zeppelin session, any temp tables registered on that second context are not visible to the Zeppelin %sql interpreter.

avatar

@Vadim So are we saying it is not possible to use zeppelin charting capabilities when using hiveContext and Temp tables?

avatar
Guru

@azeltov

So I did a little more digging and all you have to do is point the %hive interpreter at the hostname and port where the thriftServer for the target SQLContext was started. So in the case of that you have above you would set the Zeppelin Hive interpreter to point at sandbox.hortonworks.com:10002. Now when you issue a query from the %hive interpreter it will connect to the Hive context you created at runtime, exposing your temp tables and the perm tables in Hive meta store. The other option is to have both a hive and a sql context and just register each temp table on both contexts. That way you can both expose them with thrift and access with $sql context. However, I think the first option is the one you are looking for.

avatar

@Vadim will give it a try tomorrow, sorry for delay in response