Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Is it better to create hive database under /apps/hive/warehouse/ or under my own specific location?

Highlighted

Is it better to create hive database under /apps/hive/warehouse/ or under my own specific location?

New Contributor

I would like to create a hive database. I thought every database is created under /apps/hive/warehouse/ but now i see that i can create a schema with the property "location". All tables under my database are 'External tables'. Is there a disadvantage if i create a database anyhwere else than /apps/hive/warehouse/?

5 REPLIES 5
Highlighted

Re: Is it better to create hive database under /apps/hive/warehouse/ or under my own specific location?

Contributor

The directory for the database is mostly used to store table data, so there isn't a lot of difference if you are only going to use external tables. Is there any reason you want to use a different directory?

Highlighted

Re: Is it better to create hive database under /apps/hive/warehouse/ or under my own specific location?

New Contributor

Hi @sergey,

thanks for your answer. We have a usecase dependent directory structure where all data for a usecase is stored. The reason for that is the answer from @Ed Berezitsky. Every business unit has its own database with data stored in the usecase folder. So in my opinion it is better to have a external database because all data is stored external rather than having an internal database without any underlying tables.

What do you think about that?

Highlighted

Re: Is it better to create hive database under /apps/hive/warehouse/ or under my own specific location?

Expert Contributor

@Florian Rabl,

It's not about "better". Technically, there is no difference. But in a lot of companies, there are naming conventions for dirs, so you have to create DB in locations as per requirements. Also, it's a bit easier to manage security policies based on some structure of dirs instead of having flattened dirs in save location.

Re: Is it better to create hive database under /apps/hive/warehouse/ or under my own specific location?

Hi @Florian Rabl. There is a very good discussion on internal and external tables - and what happens when you do or don't specify a location. It's located here - give it a look.

https://community.hortonworks.com/questions/63991/hive-location.html

My personal opinion is that only Internal tables should be stored in /apps/hive/warehouse, and that external tables should always be stored in a different location. But there is no hard & fast rule - I only favor that because it marks a clear line where internal and external tables are stored, and makes it easy to know just by looking at the filesystem if a table is internal or external.

Also don't forget that internal and external table data is treated differently when you issue a 'DROP TABLE' statement: HDFS data corresponding to the internal table is deleted, while HDFS data corresponding to the External table is not deleted. So external table data (in /apps/hive/warehouse) would not be deleted, and ends up being a 'straggler' in the filesystem when you drop other Internal tables or databases.

Highlighted

Re: Is it better to create hive database under /apps/hive/warehouse/ or under my own specific location?

There are also some restrictions, for example the target for the CTAS command (CREATE TABLE t1 AS SELECT ...) cannot be an external table. And Ranger plugin for Hive works best on managed (non-external) tables. I usually use external tables for raw data which can land anywhere on HDFS, and then create managed ORC tables from external ones.

Don't have an account?
Coming from Hortonworks? Activate your account here