Support Questions

Find answers, ask questions, and share your expertise

Failing to save dataframe to

avatar
Explorer

Hi,

 

I'm trying to write a DataFrame to a Hive partitioned table. This works fine from spark-shell, however when I use spark-submit i get the following

exception:

 

Exception in thread "main" java.lang.NoSuchMethodException:

org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path,

java.lang.String, java.util.Map, boolean, int, boolean, boolean, boolean)

        at java.lang.Class.getMethod(Class.java:1665)

        at

org.apache.spark.sql.hive.client.Shim.findMethod(HiveShim.scala:114)

        at

org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod$lzycompute(HiveShim.scala:404)

        at

org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod(HiveShim.scala:403)

        at

org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitions(HiveShim.scala:455)

        at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(ClientWrapper.scala:562)

        at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:562)

        at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:562)

        at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:281)

        at

org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:228)

        at

org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:227)

        at

org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:270)

        at

org.apache.spark.sql.hive.client.ClientWrapper.loadDynamicPartitions(ClientWrapper.scala:561)

        at

org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:225)

        at

org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)

        at

org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276)

        at

org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)

        at

org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)

        at

org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)

        at

org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)

        at

org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)

        at

org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)

        at

org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:189)

        at

org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:239)

        at

org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:221)

        at com.pelephone.TrueCallLoader$.main(TrueCallLoader.scala:175)

        at com.pelephone.TrueCallLoader.main(TrueCallLoader.scala)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

        at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:606)

        at

org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)

        at

org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)

        at

org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)

        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)

        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

 

Can you help me finding the problem?

 

Nimrod

1 ACCEPTED SOLUTION

avatar
Explorer
I replaced the saveastable with hivecontext.sql and it worked.

Thanks!

View solution in original post

7 REPLIES 7

avatar
Champion
On the surface, it just seems to be a classpath issue, and that is why there is a difference between the shell and running on the cluster.

In which mode did you launch the job?

Are you using the SQLContext or HiveContext?

Did you set these setting in the HiveContext if used?

SET hive.exec.dynamic.partition=true; SET hive.exec.max.dynamic.partitions=2048
ET hive.exec.dynamic.partition.mode=non-strict;

avatar
Explorer

It's yarn client mode and I'm using a HiveContext with all those parameters set.

 

Nimrod

avatar
Champion
I'll give credit where it is due. I found this over on SO. This is handy and I could have used it in the past.

SPARK_PRINT_LAUNCH_COMMAND=true spark-shell

SPARK_PRINT_LAUNCH_COMMAND=true spark-submit ...

This will output the full command to stdout, to include the classpath. Search the CP for the hive-exec*.jar. That contains the method for loading dynamic partitions.

http://stackoverflow.com/questions/30512598/spark-is-there-a-way-to-print-out-classpath-of-both-spar...

avatar
Explorer

Hi,

 

I did what you suggested but it seems that both are using the same jar:

/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/jars/hive-exec-1.1.0-cdh5.8.2.jar

 

I could not find any difference in the classpath at all.

 

Nimrod

 

 

avatar
Explorer
I replaced the saveastable with hivecontext.sql and it worked.

Thanks!

avatar
New Contributor
I dont think thats a fix for the issue.

avatar
Contributor

I am having the same problem..

@mbigelow can you kindly provide some guidance as to how to initiate a hivecontext properly in an IDE like IntelliJ or Eclipse?