Created 02-16-2021 11:37 AM
Hi,
I was just wondering what is the best practice on where to store hive external tables in HDFS?
Currently they are being stored in the /tmp directory in HDFS but sometimes someone accidentally deletes the files under /tmp and right now I am trying to add permissions to ranger to prevent this from happening.
Thanks,
Created 02-18-2021 10:18 PM
Well @ryu, My understanding is when you are storing things on HDFS and that too things related to hive, it is best to use managed table considering in mind that CDP is now coming up with compaction features where in small file issue would automatically get addressed.
Compaction will not happen on external tables.
one would prefer to choose external tables if the data is stored outside HDFS like S3. This is my understanding, but again it could vary on customer to customer based on their use cases.
Created 02-18-2021 10:38 AM
Hello @ryu
There is no such path as best path but obviously not /tmp location.
You can create some path under /user/external_tables and further create the tables here. Again it totally depends upon you how you are designing and your use case.
Created 02-18-2021 03:59 PM
Thanks @tusharkathpal for the response.
Can you please let me know what are some of the use cases to storing the external tables at a certain location?
Thanks,
Created 02-18-2021 10:18 PM
Well @ryu, My understanding is when you are storing things on HDFS and that too things related to hive, it is best to use managed table considering in mind that CDP is now coming up with compaction features where in small file issue would automatically get addressed.
Compaction will not happen on external tables.
one would prefer to choose external tables if the data is stored outside HDFS like S3. This is my understanding, but again it could vary on customer to customer based on their use cases.
Created 03-23-2021 02:28 AM
To see more on Hive Managed and External tables, please see our public documentation for CDP Hive.
Ferenc Erdelyi, Technical Solutions Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: