Support Questions
Find answers, ask questions, and share your expertise

Failing to save dataframe to

Explorer

Hi,

 

I'm trying to write a DataFrame to a Hive partitioned table. This works fine from spark-shell, however when I use spark-submit i get the following

exception:

 

Exception in thread "main" java.lang.NoSuchMethodException:

org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path,

java.lang.String, java.util.Map, boolean, int, boolean, boolean, boolean)

        at java.lang.Class.getMethod(Class.java:1665)

        at

org.apache.spark.sql.hive.client.Shim.findMethod(HiveShim.scala:114)

        at

org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod$lzycompute(HiveShim.scala:404)

        at

org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod(HiveShim.scala:403)

        at

org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitions(HiveShim.scala:455)

        at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(ClientWrapper.scala:562)

        at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:562)

        at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:562)

        at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:281)

        at

org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:228)

        at

org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:227)

        at

org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:270)

        at

org.apache.spark.sql.hive.client.ClientWrapper.loadDynamicPartitions(ClientWrapper.scala:561)

        at

org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:225)

        at

org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)

        at

org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276)

        at

org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)

        at

org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)

        at

org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)

        at

org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)

        at

org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)

        at

org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)

        at

org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:189)

        at

org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:239)

        at

org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:221)

        at com.pelephone.TrueCallLoader$.main(TrueCallLoader.scala:175)

        at com.pelephone.TrueCallLoader.main(TrueCallLoader.scala)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

        at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:606)

        at

org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)

        at

org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)

        at

org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)

        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)

        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

 

Can you help me finding the problem?

 

Nimrod

1 ACCEPTED SOLUTION

Explorer
I replaced the saveastable with hivecontext.sql and it worked.

Thanks!

View solution in original post

7 REPLIES 7

Champion
On the surface, it just seems to be a classpath issue, and that is why there is a difference between the shell and running on the cluster.

In which mode did you launch the job?

Are you using the SQLContext or HiveContext?

Did you set these setting in the HiveContext if used?

SET hive.exec.dynamic.partition=true; SET hive.exec.max.dynamic.partitions=2048
ET hive.exec.dynamic.partition.mode=non-strict;

Explorer

It's yarn client mode and I'm using a HiveContext with all those parameters set.

 

Nimrod

Champion
I'll give credit where it is due. I found this over on SO. This is handy and I could have used it in the past.

SPARK_PRINT_LAUNCH_COMMAND=true spark-shell

SPARK_PRINT_LAUNCH_COMMAND=true spark-submit ...

This will output the full command to stdout, to include the classpath. Search the CP for the hive-exec*.jar. That contains the method for loading dynamic partitions.

http://stackoverflow.com/questions/30512598/spark-is-there-a-way-to-print-out-classpath-of-both-spar...

Explorer

Hi,

 

I did what you suggested but it seems that both are using the same jar:

/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/jars/hive-exec-1.1.0-cdh5.8.2.jar

 

I could not find any difference in the classpath at all.

 

Nimrod

 

 

Explorer
I replaced the saveastable with hivecontext.sql and it worked.

Thanks!

New Contributor
I dont think thats a fix for the issue.

Explorer

I am having the same problem..

@mbigelow can you kindly provide some guidance as to how to initiate a hivecontext properly in an IDE like IntelliJ or Eclipse?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.