Support Questions
Find answers, ask questions, and share your expertise

insert data to hive table using spark challenge

insert data to hive table using spark challenge

Explorer

Hi All,

I am new to spark and facing this issue while loading data into a hive table. Gone through different posts but its not working.

or each RDD I am reading, I have registered two temp tables and joined them in query and want the result of that in hive table(structure is same as one I am using in query).

Below is the snippet

val sqlquery=sqlContext.sql("select a.cdr_type,a.CGI,a.cdr_time, a.mins_int, b.Lat, b.Long,b.SiteID from hive_msc a left join my_cgi_list b"+" on a.CGI=b.CGI")sqlquery.show()
sqlquery.write.mode("append").saveAsTable("omeralvi.msc_test")  // even tried insertInto()

The code runs fine till sqlquery.show() and displays results. Even I can see the files are created in /apps/hive/warehouse/omeralvi.db/msc_test/ directory on hdfs but when I query the table on hive, its empty.

After few seconds my spark scripts throws error and quits. Below is the error

17/04/27 11:42:28 ERROR JobScheduler: Error running job streaming job 1493282520000 ms.1
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:442)
        at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$loadTable$1.apply$mcV$sp(ClientWrapper.scala:557)
        at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$loadTable$1.apply(ClientWrapper.scala:557)
        at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$loadTable$1.apply(ClientWrapper.scala:557)
        at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$withHiveState$1.apply(ClientWrapper.scala:290)
        at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:237)
        at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:236)
        at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:279)
        at org.apache.spark.sql.hive.client.ClientWrapper.loadTable(ClientWrapper.scala:556)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:256)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276)
        at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
        at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:189)
        at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:239)
        at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:221)
        at MSCCDRFilter$anonfun$main$1.apply(MSCCDRFilter.scala:84)
        at MSCCDRFilter$anonfun$main$1.apply(MSCCDRFilter.scala:68)
        at org.apache.spark.streaming.dstream.DStream$anonfun$foreachRDD$1$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
        at org.apache.spark.streaming.dstream.DStream$anonfun$foreachRDD$1$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1.apply(ForEachDStream.scala:49)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1.apply(ForEachDStream.scala:49)
        at scala.util.Try$.apply(Try.scala:161)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:227)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$anonfun$run$1.apply(JobScheduler.scala:227)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$anonfun$run$1.apply(JobScheduler.scala:227)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:226)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchFieldError: HIVE_HADOOP_SUPPORTS_SUBDIRECTORIES
        at org.apache.hadoop.hive.ql.metadata.Hive.checkPaths(Hive.java:2465)
        at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2706)
        at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1645)
        ... 46 more

thanks in advance for you help.

Omer

3 REPLIES 3
Highlighted

Re: insert data to hive table using spark challenge

New Contributor

@omer alvi I am stuck with the same issue. Did you find a solution to this?

Highlighted

Re: insert data to hive table using spark challenge

Did you guys have any solution for this ?

Please let me know as I have the same issue

Highlighted

Re: insert data to hive table using spark challenge

New Contributor

Yes, your hive metastore and spark hive metastore are different. In spark-defaults,conf you have to point to the correct hive metastore.