Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HIVE: Exception: Partition Already Exists while ADDING a NEW Partition to an EXISTING Table

avatar
Explorer

Hello all,

I am getting the below error when our application (Java) tries to execute an 'ADD partition' after 'DROP partition IF EXISTS' command in Hive:-

 

"""

Caused by: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Partition already exists: Partition(values:[xxxx, yyyy, zzzz-zz-zz, tttttttt], dbName:<db_name>, tableName:<tbl_name>

"""

 

 

Sequence of commands executed:-

Thread A: USE <db_name>

Thread B: ALTER TABLE <tbl_name> DROP IF EXISTS PARTITION(`i_id` ='xxxx', `c_id` ='yyyy', `dt` ='zzzz-zz-zz', `time` ='tttttttt') PURGE

Thread C: ALTER TABLE <tbl_name> ADD PARTITION(`i_id` ='xxxx', `c_id` ='yyyy', `dt` ='zzzz-zz-zz', `time` ='tttttttt')

 

Note:-

Cluster - 5 Mgr nodes (Hive deployed on 3 of them), 3 Utils and 30 DNs

There are no signs of any latency issues in ambari-server alerts/logs during the timeframe (+- 30 mins) when the above error/exception occurs.

It is an EXTERNAL hive table

This is a random occurrence (twice a week), associated with separate tables (not the same table everytime). 

 

Would appreciate any Help to understand what might be causing this issue (Partition ALready Exists) and if I need to look into any other logs to find out the reason behind this.

2 REPLIES 2

avatar

@Priyabrat 

This one is tough to diagnose without access to the Java source code or any other indication that the application has been designed with full regard for how concurrency works when it comes to databases, but I would say just based on the information you've supplied in this post that you want to first eliminate the most obvious possibility—that the problem is a race condition, in which case Thread C is starting to execute before the code in Thread B has fully completed executing.

I'd recommend you rewrite the Java code so that the DDL commands operate sequentially and from a single thread as a first step and see if the "random occurrence" stops happening.

 

 

Bill Brooks, Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Explorer

@ask_bill_brooks 

Thanks for your reply Bill!!

Though the threads are separate for DROP and ADD partition but I didn't find any race condition/issue in hive-server2 logs when this error occurred.

DROP partition had completed executing before ADD partition command started processing. Also, DROP partition is just a precautionary step in our application (only helpful in case of reruns or duplicate processing) as daily we receive a new file once and respectively a NEW partition gets created for this new file. Hence, I am pretty sure this is not the actual reason.

I assume that this has something to do with Hive retrying internally to execute the ADD partition causing it to fail in one of the retries but I don't have any proof to establish this theory (Nothing in hive-server2 logs as such to determine this could be the reason).