Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Spark with HWC job stuck after caching dataframe


I'm having the following problem on HDP 3.1: I have a database in the Hive warehouse that I want to access from Spark. I use the HWC connector and I am able to query the data. However, any action that I perform after caching the data frame makes the Spark job get stuck (No progress at all). However, if I remove the cache() call, then it executes fine. Assume the following code executed from the spark-shell:

import com.hortonworks.hwc.HiveWarehouseSession

val hive = HiveWarehouseSession.session(spark).build()

val dfc = hive.executeQuery("select * from mydb.mytable limit 20");



As I said, if I remove the dfc.cache() line, then it executes fine. I have tried with different queries and the above was a simple test with limiting the result set to 20 records. Does anybody know why is this happening?


New Contributor

Any updates on the issue, i am facing the same issue.

New Contributor

New Contributor

Did you get a solution/workaround for this issue? I'm facing the same issue as well.

Expert Contributor

Yes , I am also facing same issue . Any update on it or workaround?

New Contributor

You can use checkpoint instead of cache. 

Expert Contributor

Let me try with checkpoint.

Thanks for your reply. @graghu 

New Contributor

Hello, It's been two years but is there any update on the issue?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.