Welcome to the Cloudera Community

Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Spark with HWC job stuck after caching dataframe


I'm having the following problem on HDP 3.1: I have a database in the Hive warehouse that I want to access from Spark. I use the HWC connector and I am able to query the data. However, any action that I perform after caching the data frame makes the Spark job get stuck (No progress at all). However, if I remove the cache() call, then it executes fine. Assume the following code executed from the spark-shell:

import com.hortonworks.hwc.HiveWarehouseSession

val hive = HiveWarehouseSession.session(spark).build()

val dfc = hive.executeQuery("select * from mydb.mytable limit 20");




As I said, if I remove the dfc.cache() line, then it executes fine. I have tried with different queries and the above was a simple test with limiting the result set to 20 records. Does anybody know why is this happening?

Who agreed with this topic