Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Spark - Drop partition command on hive external table fails

avatar

Hi, When we execute drop partition command on hive external table from spark-shell we are getting below error.Same command works fine from hive shell.

Spark Version : 1.5.2

**************************************************

Partition exists and drop partition command works fine in Hive shell.

I had 3 partition and then issued hive drop partition command and it got succeeded.

hive> show partitions spark_2_test;

OK

server_date=2016-10-10

server_date=2016-10-11

server_date=2016-10-13

hive> ALTER TABLE spark_2_test DROP PARTITION (server_date='2016-10-13');

Dropped the partition server_date=2016-10-13

OK

Time taken: 0.217 seconds

hive> show partitions spark_2_test;

OK

server_date=2016-10-10

server_date=2016-10-11

****************************************************

Execute same from spark shell (throws "partition not found" error even though it is present).

scala> hiveCtx.sql("show partitions spark_2_test").collect().foreach(println);

[server_date=2016-10-10]

[server_date=2016-10-11]

scala> hiveCtx.sql("ALTER TABLE spark_2_test DROP PARTITION (server_date='2016-10-10')")

17/01/26 19:28:39 ERROR Driver: FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-10) org.apache.hadoop.hive.ql.parse.SemanticException: Partition not found (server_date = 2016-10-10) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addTableDropPartsOutputs(DDLSemanticAnalyzer.java:3178) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableDropParts(DDLSemanticAnalyzer.java:2694) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:278) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:451) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:440) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:278) at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:233) at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:270) at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:440) at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:430) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:561) at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:144) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:129) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:725) at $line69.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) at $line69.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29) at $line69.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) at $line69.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33) at $line69.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35) at $line69.$read$$iwC$$iwC$$iwC.<init>(<console>:37) at $line69.$read$$iwC$$iwC.<init>(<console>:39) at $line69.$read$$iwC.<init>(<console>:41) at $line69.$read.<init>(<console>:43) at $line69.$read$.<init>(<console>:47) at $line69.$read$.<clinit>(<console>) at $line69.$eval$.<init>(<console>:7) at $line69.$eval$.<clinit>(<console>) at $line69.$eval.$print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: MetaException(message:Unable to find class: 㐀org.apache.hadoop.hive.ql.udf.generic.G Serialization trace: typeInfo (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_expr_result$get_partitions_by_expr_resultStandardScheme.read(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_expr_result$get_partitions_by_expr_resultStandardScheme.read(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_expr_result.read(ThriftHiveMetastore.java) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions_by_expr(ThriftHiveMetastore.java:2277) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_by_expr(ThriftHiveMetastore.java:2264) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByExpr(HiveMetaStoreClient.java:1130) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) at com.sun.proxy.$Proxy44.listPartitionsByExpr(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByExpr(Hive.java:2289) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addTableDropPartsOutputs(DDLSemanticAnalyzer.java:3176) ... 77 more 17/01/26 19:28:39 ERROR ClientWrapper: ====================== HIVE FAILURE OUTPUT ====================== SET hive.support.sql11.reserved.keywords=false FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Table spark_3_test already exists) OK OK OK OK FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-10) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Partition already exists: Partition(values:[2016-10-10], dbName:default, tableName:spark_3_test, createTime:0, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:dept, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:2, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{serialization.format=1}), bucketCols:[dept], sortCols:[Order(col:dept, order:1)], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:null)) FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-10) OK OK OK FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-23) OK FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-23) OK FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-10) ====================== END HIVE FAILURE OUTPUT ====================== org.apache.spark.sql.execution.QueryExecutionException: FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-10) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:455) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:440) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:278) at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:233) at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:270) at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:440) at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:430) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:561) at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:144) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:129) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:725) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:35) at $iwC$$iwC$$iwC.<init>(<console>:37) at $iwC$$iwC.<init>(<console>:39) at $iwC.<init>(<console>:41) at <init>(<console>:43) at .<init>(<console>:47)

Thanks

Subacini

1 ACCEPTED SOLUTION

avatar
Super Collaborator

@subacini balakrishnan,

Here we go! Please change partition key type to string. Date is not supported as type for partitions.

View solution in original post

7 REPLIES 7

avatar
Super Collaborator

@subacini balakrishnan,

There is some mess in logs...

In a trace log above, in "HIVE FAILURE OUTPUT" section, you have:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Table spark_3_test already exists)....

... dbName:default, tableName:spark_3_test

...Partition not found (server_date = 2016-10-23)

Not sure how it is related to the query you are running for "spark_2_test" table and for different partition.

I have reproduced all the steps in both Zeppelin and spark-shell. There shouldn't be any issue with running DROP PARTITION from spark shell. Try to give it clean shot - new table, new partitions, no locks of data/directories, no two tables with the same location, etc. Just clean shot. IMO it is related to the specific table configuration/definition.

avatar
New Member

Seems you are providing a different log as it is showing some different error for various partition as partition not found

Partition not found (server_date = 2016-10-10) OK OK OK FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-23) OK FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-23) OK FAILED: SemanticException [Error 10006]: Partition not found (server_date = 2016-10-10)

Below query is working fine for me

hiveCtx.sql(s"ALTER TABLE spark_2_test DROP IF EXISTS PARTITION (server_date='2016-10-10')");

avatar

Hi Ed, Khushbhu

Thanks for your reply. Sorry about the logs, I was using two tables spark_2_test, spark_3_test. In both tables i am getting same error.

From Hive shell, execute below commands

CREATE EXTERNAL TABLE spark_4_test(name string, dept string ) PARTITIONED BY ( server_date date) LOCATION '/xxx/yyy/spark4'

insert into table spark_4_test partition(server_date='2016-10-23') values ('a','d1')

insert into table spark_4_test partition(server_date='2016-10-10') values ('a','d1')

From spark-shell, execute drop partition command. It fails.

hiveCtx.sql("ALTER TABLE spark_4_test DROP IF EXISTS PARTITION (server_date ='2016-10-10')")

Note: If PARTITIONED BY is String, it works fine . In my case, it is date type.

Thanks

Subacini

avatar
Super Collaborator

@subacini balakrishnan,

Here we go! Please change partition key type to string. Date is not supported as type for partitions.

avatar
Super Collaborator

@subacini balakrishnan,

Have you tried that?

avatar

Thank you Ed. String works.

The issue is addressed in 2.1.0 - https://issues.apache.org/jira/browse/SPARK-17388

Thanks

Subacini

avatar
Super Collaborator

@subacini balakrishnan, I'm glad it worked for you. Could you please accept the correct answer, so the question would be marked as answered? thanks!