Support Questions

Find answers, ask questions, and share your expertise

Spark with HWC job stuck after caching dataframe

Explorer

I'm having the following problem on HDP 3.1: I have a database in the Hive warehouse that I want to access from Spark. I use the HWC connector and I am able to query the data. However, any action that I perform after caching the data frame makes the Spark job get stuck (No progress at all). However, if I remove the cache() call, then it executes fine. Assume the following code executed from the spark-shell:

import com.hortonworks.hwc.HiveWarehouseSession

val hive = HiveWarehouseSession.session(spark).build()

val dfc = hive.executeQuery("select * from mydb.mytable limit 20");

dfc.cache()

dfc.show(10)

 

As I said, if I remove the dfc.cache() line, then it executes fine. I have tried with different queries and the above was a simple test with limiting the result set to 20 records. Does anybody know why is this happening?

7 REPLIES 7

New Contributor

Any updates on the issue, i am facing the same issue.

New Contributor

New Contributor

Did you get a solution/workaround for this issue? I'm facing the same issue as well.

Expert Contributor

Yes , I am also facing same issue . Any update on it or workaround?

New Contributor

You can use checkpoint instead of cache. 

Expert Contributor

Let me try with checkpoint.

Thanks for your reply. @graghu 

New Contributor

Hello, It's been two years but is there any update on the issue?