Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark - Drop partition command on hive external table fails

avatar

Hi, When we execute drop partition command on hive external table from spark-shell we are getting below error.Same command works fine from hive shell.

Spark Version : 1.5.2

**************************************************

Partition exists and drop partition command works fine in Hive shell.

I had 3 partition and then issued hive drop partition command and it got succeeded.

hive> show partitions spark_2_test;

OK

server_date=2016-10-10

server_date=2016-10-11

server_date=2016-10-13

hive> ALTER TABLE spark_2_test DROP PARTITION (server_date='2016-10-13');

Dropped the partition server_date=2016-10-13

OK

Time taken: 0.217 seconds

hive> show partitions spark_2_test;

OK

server_date=2016-10-10

server_date=2016-10-11

****************************************************

Execute same from spark shell (throws "partition not found" error even though it is present).

scala> hiveCtx.sql("show partitions spark_2_test").collect().foreach(println);

[server_date=2016-10-10]

[server_date=2016-10-11]

scala> hiveCtx.sql("ALTER TABLE spark_2_test DROP PARTITION (server_date='2016-10-10')")

17/01/26 19:28:39 ERROR Driver: FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-10) org.apache.hadoop.hive.ql.parse.SemanticException: Partition not found (server_date = 2016-10-10) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addTableDropPartsOutputs(DDLSemanticAnalyzer.java:3178) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableDropParts(DDLSemanticAnalyzer.java:2694) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:278) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:451) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:440) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:278) at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:233) at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:270) at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:440) at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:430) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:561) at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:144) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:129) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:725) at $line69.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) at $line69.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29) at $line69.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) at $line69.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33) at $line69.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35) at $line69.$read$$iwC$$iwC$$iwC.<init>(<console>:37) at $line69.$read$$iwC$$iwC.<init>(<console>:39) at $line69.$read$$iwC.<init>(<console>:41) at $line69.$read.<init>(<console>:43) at $line69.$read$.<init>(<console>:47) at $line69.$read$.<clinit>(<console>) at $line69.$eval$.<init>(<console>:7) at $line69.$eval$.<clinit>(<console>) at $line69.$eval.$print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: MetaException(message:Unable to find class: 㐀org.apache.hadoop.hive.ql.udf.generic.G Serialization trace: typeInfo (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_expr_result$get_partitions_by_expr_resultStandardScheme.read(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_expr_result$get_partitions_by_expr_resultStandardScheme.read(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_expr_result.read(ThriftHiveMetastore.java) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions_by_expr(ThriftHiveMetastore.java:2277) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_by_expr(ThriftHiveMetastore.java:2264) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByExpr(HiveMetaStoreClient.java:1130) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) at com.sun.proxy.$Proxy44.listPartitionsByExpr(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByExpr(Hive.java:2289) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addTableDropPartsOutputs(DDLSemanticAnalyzer.java:3176) ... 77 more 17/01/26 19:28:39 ERROR ClientWrapper: ====================== HIVE FAILURE OUTPUT ====================== SET hive.support.sql11.reserved.keywords=false FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Table spark_3_test already exists) OK OK OK OK FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-10) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Partition already exists: Partition(values:[2016-10-10], dbName:default, tableName:spark_3_test, createTime:0, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:dept, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:2, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{serialization.format=1}), bucketCols:[dept], sortCols:[Order(col:dept, order:1)], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:null)) FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-10) OK OK OK FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-23) OK FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-23) OK FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-10) ====================== END HIVE FAILURE OUTPUT ====================== org.apache.spark.sql.execution.QueryExecutionException: FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-10) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:455) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:440) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:278) at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:233) at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:270) at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:440) at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:430) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:561) at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:144) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:129) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:725) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:35) at $iwC$$iwC$$iwC.<init>(<console>:37) at $iwC$$iwC.<init>(<console>:39) at $iwC.<init>(<console>:41) at <init>(<console>:43) at .<init>(<console>:47)

Thanks

Subacini

1 ACCEPTED SOLUTION

avatar
Super Collaborator

@subacini balakrishnan,

Here we go! Please change partition key type to string. Date is not supported as type for partitions.

View solution in original post

7 REPLIES 7

avatar
Super Collaborator

@subacini balakrishnan,

There is some mess in logs...

In a trace log above, in "HIVE FAILURE OUTPUT" section, you have:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Table spark_3_test already exists)....

... dbName:default, tableName:spark_3_test

...Partition not found (server_date = 2016-10-23)

Not sure how it is related to the query you are running for "spark_2_test" table and for different partition.

I have reproduced all the steps in both Zeppelin and spark-shell. There shouldn't be any issue with running DROP PARTITION from spark shell. Try to give it clean shot - new table, new partitions, no locks of data/directories, no two tables with the same location, etc. Just clean shot. IMO it is related to the specific table configuration/definition.

avatar
New Contributor

Seems you are providing a different log as it is showing some different error for various partition as partition not found

Partition not found (server_date = 2016-10-10) OK OK OK FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-23) OK FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-23) OK FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-10)

Below query is working fine for me

hiveCtx.sql(s"ALTER TABLE spark_2_test DROP IF EXISTS PARTITION (server_date='2016-10-10')");

avatar

Hi Ed, Khushbhu

Thanks for your reply. Sorry about the logs, I was using two tables spark_2_test, spark_3_test. In both tables i am getting same error.

From Hive shell, execute below commands

CREATE EXTERNAL TABLE spark_4_test(name string, dept string ) PARTITIONED BY ( server_date date) LOCATION '/xxx/yyy/spark4'

insert into table spark_4_test partition(server_date='2016-10-23') values ('a','d1')

insert into table spark_4_test partition(server_date='2016-10-10') values ('a','d1')

From spark-shell, execute drop partition command. It fails.

hiveCtx.sql("ALTER TABLE spark_4_test DROP IF EXISTS PARTITION (server_date ='2016-10-10')")

Note: If PARTITIONED BY is String, it works fine . In my case, it is date type.

Thanks

Subacini

avatar
Super Collaborator

@subacini balakrishnan,

Here we go! Please change partition key type to string. Date is not supported as type for partitions.

avatar
Super Collaborator

@subacini balakrishnan,

Have you tried that?

avatar

Thank you Ed. String works.

The issue is addressed in 2.1.0 - https://issues.apache.org/jira/browse/SPARK-17388

Thanks

Subacini

avatar
Super Collaborator

@subacini balakrishnan, I'm glad it worked for you. Could you please accept the correct answer, so the question would be marked as answered? thanks!