Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Updated information on differences between External and Internal Hive tables?

avatar
Rising Star

A user recently asked about locking hive tables to make sure reads are consistent, and that led me to the Apache documentation on hive transactions where I saw the following:

External tables cannot be made ACID tables since the changes on external tables are beyond the control of the compactor (HIVE-13175).

This leads me to wonder whether updated/comprehensive documentation exists on the differences between internal and external tables in hive. Traditionally, the explanation of the difference between the two has been that hive maintains both the data and metadata with internal tables, so dropping an internal table will drop the data and metadata, while dropping an external table will only drop the metadata, but otherwise, they're functionally equivalent. The note above regarding ACID/transactions suggests internal and external table capabilities/features are diverging....

Thoughts?

Thanks in advance!

1 ACCEPTED SOLUTION

avatar
Explorer
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar

Hi @Vincent Romeo I think I know what's going on here. The issue isn't that there is something technically different between external and internal tables but, instead there is a design expectation between the two functionalities. A user will use external tables because they expect the data to not change. In this way, you could have multiple schemas applied to the same data set without fear of any one user deleting or changing the data and, if they decide to drop a table, the data isn't removed.

The sole purpose of ACID is to insert, update, and delete data so this goes against the basic premise of why you would use external tables. To adhere to this expectation, the developers essentially disable the ability to run ACID on external tables, i.e. disable compaction which is the change mechanism for Hive ACID.

Hope this helps!

avatar
Rising Star

Unfortunately, my group was using external tables as easier way to deal with quotas in a "multi-tenant" cluster and impose some governance on hive. (i.e. Most users/groups can only create external tables, and the files need to be landed in their assigned folder in HDFS. DBAs control internal tables in hive.) Somewhere, we missed the "basic premise" that the data in external tables won't change....

avatar

@Vincent Romeo. Your use case makes a lot of sense. I don't know for sure but you might be able to override the setting. Adding Wei to the conversation.

+ @Wei Zheng

avatar
Explorer
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login