09-05-2017 06:22 AM
I am using spark 1.6 cdh 5.7.2 and sometimes when I run a spark job with saveAsTable where the dataframe is created from a hive query it just deletes the table but not always which is weird.
I noticed a similiar issue is referenced by cloudera over here. The odd thing is I tried to emulate this issue on the vm and I was able to fix it but this did not work when we deployed it to the server maybe it has something to do with parallelism.
For example something like this.
var dfLong = hiveContext.sql(s"select * from $tblName") dfLong.write.mode("overwrite").parquet(tblLocation)
Would sometimes cause the table to be overwritten I tried to persist the df but that did not work either.
If I manually create the table and insert each time will that work using the hiveContext and my question is if so why?