Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

what is the best practice on where to store hive external tables in HDFS?

avatar
Contributor

Hi,

I was just wondering what is the best practice on where to store hive external tables in HDFS?

Currently they are being stored in the /tmp directory in HDFS but sometimes someone accidentally deletes the files under /tmp and right now I am trying to add permissions to ranger to prevent this from happening.

 

Thanks,

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Well @ryu, My understanding is when you are storing things on HDFS and that too things related to hive, it is best to use managed table considering in mind that CDP is now coming up with compaction features where in small file issue would automatically get addressed.

 

Compaction will not happen on external tables.

 

one would prefer to choose external tables if the data is stored outside HDFS like S3. This is my understanding, but again it could vary on customer to customer based on their use cases.

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

Hello @ryu 

 

There is no such path as best path but obviously not /tmp location.

 

You can create some path under /user/external_tables and further create the tables here. Again it totally depends upon you how you are designing and your use case.

avatar
Contributor

Thanks @tusharkathpal  for the response.

 

Can you please let me know what are some of the use cases to storing the external tables at a certain location?

 

Thanks,

avatar
Expert Contributor

Well @ryu, My understanding is when you are storing things on HDFS and that too things related to hive, it is best to use managed table considering in mind that CDP is now coming up with compaction features where in small file issue would automatically get addressed.

 

Compaction will not happen on external tables.

 

one would prefer to choose external tables if the data is stored outside HDFS like S3. This is my understanding, but again it could vary on customer to customer based on their use cases.

avatar
Moderator

To see more on Hive Managed and External tables, please see our public documentation for CDP Hive.


Ferenc Erdelyi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community: