Support Questions

nimrodor · ‎02-14-2017

Hi,

I'm trying to write a DataFrame to a Hive partitioned table. This works fine from spark-shell, however when I use spark-submit i get the following

exception:

Exception in thread "main" java.lang.NoSuchMethodException:

org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path,

java.lang.String, java.util.Map, boolean, int, boolean, boolean, boolean)

at java.lang.Class.getMethod(Class.java:1665)

at

org.apache.spark.sql.hive.client.Shim.findMethod(HiveShim.scala:114)

at

org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod$lzycompute(HiveShim.scala:404)

at

org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod(HiveShim.scala:403)

at

org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitions(HiveShim.scala:455)

at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(ClientWrapper.scala:562)

at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:562)

at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:562)

at

org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:281)

at

org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:228)

at

org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:227)

at

org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:270)

at

org.apache.spark.sql.hive.client.ClientWrapper.loadDynamicPartitions(ClientWrapper.scala:561)

at

org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:225)

at

org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)

at

org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276)

at

org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)

at

org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)

at

org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)

at

org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)

at

org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)

at

org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)

at

org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:189)

at

org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:239)

at

org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:221)

at com.pelephone.TrueCallLoader$.main(TrueCallLoader.scala:175)

at com.pelephone.TrueCallLoader.main(TrueCallLoader.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at

org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)

at

org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)

at

org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Can you help me finding the problem?

Nimrod

nimrodor · ‎02-16-2017

I replaced the saveastable with hivecontext.sql and it worked.

Thanks!

View solution in original post

mbigelow · ‎02-14-2017

On the surface, it just seems to be a classpath issue, and that is why there is a difference between the shell and running on the cluster.

In which mode did you launch the job?

Are you using the SQLContext or HiveContext?

Did you set these setting in the HiveContext if used?

SET hive.exec.dynamic.partition=true; SET hive.exec.max.dynamic.partitions=2048
ET hive.exec.dynamic.partition.mode=non-strict;

nimrodor · ‎02-14-2017

It's yarn client mode and I'm using a HiveContext with all those parameters set.

Nimrod

mbigelow · ‎02-14-2017

I'll give credit where it is due. I found this over on SO. This is handy and I could have used it in the past.

SPARK_PRINT_LAUNCH_COMMAND=true spark-shell

SPARK_PRINT_LAUNCH_COMMAND=true spark-submit ...

This will output the full command to stdout, to include the classpath. Search the CP for the hive-exec*.jar. That contains the method for loading dynamic partitions.

http://stackoverflow.com/questions/30512598/spark-is-there-a-way-to-print-out-classpath-of-both-spar...

nimrodor · ‎02-15-2017

Hi,

I did what you suggested but it seems that both are using the same jar:

/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/jars/hive-exec-1.1.0-cdh5.8.2.jar

I could not find any difference in the classpath at all.

Nimrod

nimrodor · ‎02-16-2017

I replaced the saveastable with hivecontext.sql and it worked.

Thanks!

DeshSingh · ‎07-31-2017

I dont think thats a fix for the issue.

anirbandd · ‎09-01-2017

I am having the same problem..

@mbigelow can you kindly provide some guidance as to how to initiate a hivecontext properly in an IDE like IntelliJ or Eclipse?

Cloudera Community

Support Questions

Failing to save dataframe to