Reply
Explorer
Posts: 8
Registered: ‎10-06-2016
Accepted Solution

Failing to save dataframe to

Hi,

 

I'm trying to write a DataFrame to a Hive partitioned table. This works fine from spark-shell, however when I use spark-submit i get the following

exception:

 

Exception in thread "main" java.lang.NoSuchMethodException:

org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path,

java.lang.String, java.util.Map, boolean, int, boolean, boolean, boolean)

        at java.lang.Class.getMethod(Class.java:1665)

        at

org.apache.spark.sql.hive.client.Shim.findMethod(HiveShim.scala:114)

        at

org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod$lzycompute(HiveShim.scala:404)

        at

org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod(HiveShim.scala:403)

        at

org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitions(HiveShim.scala:455)

        at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(ClientWrapper.scala:562)

        at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:562)

        at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:562)

        at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:281)

        at

org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:228)

        at

org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:227)

        at

org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:270)

        at

org.apache.spark.sql.hive.client.ClientWrapper.loadDynamicPartitions(ClientWrapper.scala:561)

        at

org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:225)

        at

org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)

        at

org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276)

        at

org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)

        at

org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)

        at

org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)

        at

org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)

        at

org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)

        at

org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)

        at

org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:189)

        at

org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:239)

        at

org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:221)

        at com.pelephone.TrueCallLoader$.main(TrueCallLoader.scala:175)

        at com.pelephone.TrueCallLoader.main(TrueCallLoader.scala)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

        at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:606)

        at

org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)

        at

org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)

        at

org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)

        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)

        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

 

Can you help me finding the problem?

 

Nimrod

Posts: 642
Topics: 3
Kudos: 105
Solutions: 67
Registered: ‎08-16-2016

Re: Failing to save dataframe to

On the surface, it just seems to be a classpath issue, and that is why there is a difference between the shell and running on the cluster.

In which mode did you launch the job?

Are you using the SQLContext or HiveContext?

Did you set these setting in the HiveContext if used?

SET hive.exec.dynamic.partition=true; SET hive.exec.max.dynamic.partitions=2048
ET hive.exec.dynamic.partition.mode=non-strict;
Explorer
Posts: 8
Registered: ‎10-06-2016

Re: Failing to save dataframe to

It's yarn client mode and I'm using a HiveContext with all those parameters set.

 

Nimrod

Posts: 642
Topics: 3
Kudos: 105
Solutions: 67
Registered: ‎08-16-2016

Re: Failing to save dataframe to

I'll give credit where it is due. I found this over on SO. This is handy and I could have used it in the past.

SPARK_PRINT_LAUNCH_COMMAND=true spark-shell

SPARK_PRINT_LAUNCH_COMMAND=true spark-submit ...

This will output the full command to stdout, to include the classpath. Search the CP for the hive-exec*.jar. That contains the method for loading dynamic partitions.

http://stackoverflow.com/questions/30512598/spark-is-there-a-way-to-print-out-classpath-of-both-spar...
Explorer
Posts: 8
Registered: ‎10-06-2016

Re: Failing to save dataframe to

Hi,

 

I did what you suggested but it seems that both are using the same jar:

/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/jars/hive-exec-1.1.0-cdh5.8.2.jar

 

I could not find any difference in the classpath at all.

 

Nimrod

 

 

Explorer
Posts: 8
Registered: ‎10-06-2016

Re: Failing to save dataframe to

I replaced the saveastable with hivecontext.sql and it worked.

Thanks!
New Contributor
Posts: 3
Registered: ‎07-31-2017

Re: Failing to save dataframe to

I dont think thats a fix for the issue.
Highlighted
Explorer
Posts: 11
Registered: ‎02-15-2017

Re: Failing to save dataframe to

I am having the same problem..

@mbigelow can you kindly provide some guidance as to how to initiate a hivecontext properly in an IDE like IntelliJ or Eclipse?

Announcements