I'm having the following problem on HDP 3.1: I have a database in the Hive warehouse that I want to access from Spark. I use the HWC connector and I am able to query the data. However, any action that I perform after caching the data frame makes the Spark job get stuck (No progress at all). However, if I remove the cache() call, then it executes fine. Assume the following code executed from the spark-shell:
import com.hortonworks.hwc.HiveWarehouseSession
val hive = HiveWarehouseSession.session(spark).build()
val dfc = hive.executeQuery("select * from mydb.mytable limit 20");
dfc.cache()
dfc.show(10)
As I said, if I remove the dfc.cache() line, then it executes fine. I have tried with different queries and the above was a simple test with limiting the result set to 20 records. Does anybody know why is this happening?
Created 08-14-2019 11:59 AM
Any updates on the issue, i am facing the same issue.
Created 08-28-2019 04:00 AM
Hive Issue created in jira :https://issues.apache.org/jira/browse/HIVE-22153
Created 09-17-2019 02:48 PM
Did you get a solution/workaround for this issue? I'm facing the same issue as well.
Created 03-25-2020 11:18 PM
Yes , I am also facing same issue . Any update on it or workaround?
Created 04-09-2020 10:34 AM
You can use checkpoint instead of cache.
Created 05-01-2020 07:25 AM
Let me try with checkpoint.
Thanks for your reply. @graghu
Created 11-11-2021 03:15 AM
Hello, It's been two years but is there any update on the issue?