I'm having the following problem on HDP 3.1: I have a database in the Hive warehouse that I want to access from Spark. I use the HWC connector and I am able to query the data. However, any action that I perform after caching the data frame makes the Spark job get stuck (No progress at all). However, if I remove the cache() call, then it executes fine. Assume the following code executed from the spark-shell:
val hive = HiveWarehouseSession.session(spark).build()
val dfc = hive.executeQuery("select * from mydb.mytable limit 20");
As I said, if I remove the dfc.cache() line, then it executes fine. I have tried with different queries and the above was a simple test with limiting the result set to 20 records. Does anybody know why is this happening?
Any updates on the issue, i am facing the same issue.