Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

insert data to hive table using spark challenge

Explorer

Hi All,

I am new to spark and facing this issue while loading data into a hive table. Gone through different posts but its not working.

or each RDD I am reading, I have registered two temp tables and joined them in query and want the result of that in hive table(structure is same as one I am using in query).

Below is the snippet

val sqlquery=sqlContext.sql("select a.cdr_type,a.CGI,a.cdr_time, a.mins_int, b.Lat, b.Long,b.SiteID from hive_msc a left join my_cgi_list b"+" on a.CGI=b.CGI")sqlquery.show()
sqlquery.write.mode("append").saveAsTable("omeralvi.msc_test")  // even tried insertInto()

The code runs fine till sqlquery.show() and displays results. Even I can see the files are created in /apps/hive/warehouse/omeralvi.db/msc_test/ directory on hdfs but when I query the table on hive, its empty.

After few seconds my spark scripts throws error and quits. Below is the error

17/04/27 11:42:28 ERROR JobScheduler: Error running job streaming job 1493282520000 ms.1
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:442)
        at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$loadTable$1.apply$mcV$sp(ClientWrapper.scala:557)
        at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$loadTable$1.apply(ClientWrapper.scala:557)
        at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$loadTable$1.apply(ClientWrapper.scala:557)
        at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$withHiveState$1.apply(ClientWrapper.scala:290)
        at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:237)
        at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:236)
        at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:279)
        at org.apache.spark.sql.hive.client.ClientWrapper.loadTable(ClientWrapper.scala:556)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:256)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276)
        at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
        at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:189)
        at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:239)
        at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:221)
        at MSCCDRFilter$anonfun$main$1.apply(MSCCDRFilter.scala:84)
        at MSCCDRFilter$anonfun$main$1.apply(MSCCDRFilter.scala:68)
        at org.apache.spark.streaming.dstream.DStream$anonfun$foreachRDD$1$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
        at org.apache.spark.streaming.dstream.DStream$anonfun$foreachRDD$1$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1.apply(ForEachDStream.scala:49)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1.apply(ForEachDStream.scala:49)
        at scala.util.Try$.apply(Try.scala:161)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:227)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$anonfun$run$1.apply(JobScheduler.scala:227)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$anonfun$run$1.apply(JobScheduler.scala:227)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:226)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchFieldError: HIVE_HADOOP_SUPPORTS_SUBDIRECTORIES
        at org.apache.hadoop.hive.ql.metadata.Hive.checkPaths(Hive.java:2465)
        at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2706)
        at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1645)
        ... 46 more

thanks in advance for you help.

Omer

3 REPLIES 3

New Contributor

@omer alvi I am stuck with the same issue. Did you find a solution to this?

Did you guys have any solution for this ?

Please let me know as I have the same issue

New Contributor

Yes, your hive metastore and spark hive metastore are different. In spark-defaults,conf you have to point to the correct hive metastore.