Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

insert data to hive table using spark challenge

insert data to hive table using spark challenge

New Contributor

Hi All,

I am new to spark and facing this issue while loading data into a hive table. Gone through different posts but its not working.

or each RDD I am reading, I have registered two temp tables and joined them in query and want the result of that in hive table(structure is same as one I am using in query).

Below is the snippet

val sqlquery=sqlContext.sql("select a.cdr_type,a.CGI,a.cdr_time, a.mins_int, b.Lat, b.Long,b.SiteID from hive_msc a left join my_cgi_list b"+" on a.CGI=b.CGI")sqlquery.show()
sqlquery.write.mode("append").saveAsTable("omeralvi.msc_test")  // even tried insertInto()

The code runs fine till sqlquery.show() and displays results. Even I can see the files are created in /apps/hive/warehouse/omeralvi.db/msc_test/ directory on hdfs but when I query the table on hive, its empty.

After few seconds my spark scripts throws error and quits. Below is the error

17/04/27 11:42:28 ERROR JobScheduler: Error running job streaming job 1493282520000 ms.1
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:442)
        at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$loadTable$1.apply$mcV$sp(ClientWrapper.scala:557)
        at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$loadTable$1.apply(ClientWrapper.scala:557)
        at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$loadTable$1.apply(ClientWrapper.scala:557)
        at org.apache.spark.sql.hive.client.ClientWrapper$anonfun$withHiveState$1.apply(ClientWrapper.scala:290)
        at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:237)
        at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:236)
        at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:279)
        at org.apache.spark.sql.hive.client.ClientWrapper.loadTable(ClientWrapper.scala:556)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:256)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276)
        at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
        at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:189)
        at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:239)
        at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:221)
        at MSCCDRFilter$anonfun$main$1.apply(MSCCDRFilter.scala:84)
        at MSCCDRFilter$anonfun$main$1.apply(MSCCDRFilter.scala:68)
        at org.apache.spark.streaming.dstream.DStream$anonfun$foreachRDD$1$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
        at org.apache.spark.streaming.dstream.DStream$anonfun$foreachRDD$1$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1.apply(ForEachDStream.scala:49)
        at org.apache.spark.streaming.dstream.ForEachDStream$anonfun$1.apply(ForEachDStream.scala:49)
        at scala.util.Try$.apply(Try.scala:161)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:227)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$anonfun$run$1.apply(JobScheduler.scala:227)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$anonfun$run$1.apply(JobScheduler.scala:227)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:226)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchFieldError: HIVE_HADOOP_SUPPORTS_SUBDIRECTORIES
        at org.apache.hadoop.hive.ql.metadata.Hive.checkPaths(Hive.java:2465)
        at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2706)
        at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1645)
        ... 46 more

thanks in advance for you help.

Omer

3 REPLIES 3

Re: insert data to hive table using spark challenge

New Contributor

@omer alvi I am stuck with the same issue. Did you find a solution to this?

Re: insert data to hive table using spark challenge

New Contributor

Did you guys have any solution for this ?

Please let me know as I have the same issue

Re: insert data to hive table using spark challenge

New Contributor

Yes, your hive metastore and spark hive metastore are different. In spark-defaults,conf you have to point to the correct hive metastore.