Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark with HWC job stuck after caching dataframe

Highlighted

Spark with HWC job stuck after caching dataframe

New Contributor

I'm having the following problem on HDP 3.1: I have a database in the Hive warehouse that I want to access from Spark. I use the HWC connector and I am able to query the data. However, any action that I perform after caching the data frame makes the Spark job get stuck (No progress at all). However, if I remove the cache() call, then it executes fine. Assume the following code executed from the spark-shell:

import com.hortonworks.hwc.HiveWarehouseSession

val hive = HiveWarehouseSession.session(spark).build()

val dfc = hive.executeQuery("select * from mydb.mytable limit 20");

dfc.cache()

dfc.show(10)


As I said, if I remove the dfc.cache() line, then it executes fine. I have tried with different queries and the above was a simple test with limiting the result set to 20 records. Does anybody know why is this happening?

1 REPLY 1

Re: Spark with HWC job stuck after caching dataframe

New Contributor

Any updates on the issue, i am facing the same issue.