Created 04-27-2017 09:10 AM
Hi All,
I am new to spark and facing this issue while loading data into a hive table. Gone through different posts but its not working.
or each RDD I am reading, I have registered two temp tables and joined them in query and want the result of that in hive table(structure is same as one I am using in query).
Below is the snippet
val sqlquery=sqlContext.sql("select a.cdr_type,a.CGI,a.cdr_time, a.mins_int, b.Lat, b.Long,b.SiteID from hive_msc a left join my_cgi_list b"+" on a.CGI=b.CGI")sqlquery.show() sqlquery.write.mode("append").saveAsTable("omeralvi.msc_test") // even tried insertInto()
The code runs fine till sqlquery.show() and displays results. Even I can see the files are created in /apps/hive/warehouse/omeralvi.db/msc_test/ directory on hdfs but when I query the table on hive, its empty.
After few seconds my spark scripts throws error and quits. Below is the error
17/04/27 11:42:28 ERROR JobScheduler: Error running job streaming job 1493282520000 ms.1 java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:442) at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$loadTable$1.apply$mcV$sp(ClientWrapper.scala:557) at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$loadTable$1.apply(ClientWrapper.scala:557) at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$loadTable$1.apply(ClientWrapper.scala:557) at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$withHiveState$1.apply(ClientWrapper.scala:290) at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:237) at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:236) at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:279) at org.apache.spark.sql.hive.client.ClientWrapper.loadTable(ClientWrapper.scala:556) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:256) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276) at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:189) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:239) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:221) at MSCCDRFilter$anonfun$main$1.apply(MSCCDRFilter.scala:84) at MSCCDRFilter$anonfun$main$1.apply(MSCCDRFilter.scala:68) at org.apache.spark.streaming.dstream.DStream$anonfun$foreachRDD$1$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) at org.apache.spark.streaming.dstream.DStream$anonfun$foreachRDD$1$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426) at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49) at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1.apply(ForEachDStream.scala:49) at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1.apply(ForEachDStream.scala:49) at scala.util.Try$.apply(Try.scala:161) at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:227) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$anonfun$run$1.apply(JobScheduler.scala:227) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$anonfun$run$1.apply(JobScheduler.scala:227) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:226) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoSuchFieldError: HIVE_HADOOP_SUPPORTS_SUBDIRECTORIES at org.apache.hadoop.hive.ql.metadata.Hive.checkPaths(Hive.java:2465) at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2706) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1645) ... 46 more
thanks in advance for you help.
Omer
Created 10-18-2018 09:08 AM
@omer alvi I am stuck with the same issue. Did you find a solution to this?
Created 12-06-2018 05:23 AM
Did you guys have any solution for this ?
Please let me know as I have the same issue
Created 12-06-2018 03:00 PM
Yes, your hive metastore and spark hive metastore are different. In spark-defaults,conf you have to point to the correct hive metastore.