Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Hive Database Size

Rising Star

Is there any way I can restrict a Hive database to use defined space? Or any property to define a space quota for a database in hive. I want to restrict a database to use only 25% of my HDFS storage not more than that.

I am exploring other option where I can restrict hdfs folder which will be the location of db.

Thank you.

1 ACCEPTED SOLUTION

Super Guru

@Shashant Panwar

You can set HDFS quota. @mbalakrishnan is right. It really doesn't work at Hive level. Think about this. You set quota at directory level. Assume that you set your hive warehouse directory to be 25% of HDFS storage. So what. I'll just create an external table. You restrict that external directory, I can create another directory and point my external table to it. So, here is how you can almost achieve it but it's a combination of technology as well as policy you will implement.

1. Assign HDFS quota to directories where users can create table (data warehouse directory as well as external directories).

2. This quota combined should not be more than 25%.

3. Establish an organizational policy that Hive tables must be created only on above directories. If people create table outside of those directories, you should warn them that, that data will be deleted.

4. That's it. Enforce your policy.

View solution in original post

7 REPLIES 7

Rising Star

Update: I checked Ranger as well but did not find any setup related to space limitation.

Expert Contributor
@Shashant Panwar

I dont understand the usecase of limiting a database to use a X percent. Lets assume you have a table which needs more than 25% of the disk space then you wont be able to do anything unless you delete and free up the space on that DB to allow data into the table.

Also one cannot set limitation at Hive level to state how much space a database can use. It needs to be set at HDFS user level quota.

-Mahesh

Rising Star

Thank you @mbalakrishnan. A simple use case is to limit the space for different teams. I do not want one team to utilize the entire cluster space. For eg: Team A 25%, Team B 25% and Team C 50%. Something like YARN Capacity queues. Space quota per team for cost management.

I will check the same for HDFS. Can you refer some links to check HDFS user level quota?

Expert Contributor

Rising Star

Thank you @mbalakrishnan

Super Guru

@Shashant Panwar

You can set HDFS quota. @mbalakrishnan is right. It really doesn't work at Hive level. Think about this. You set quota at directory level. Assume that you set your hive warehouse directory to be 25% of HDFS storage. So what. I'll just create an external table. You restrict that external directory, I can create another directory and point my external table to it. So, here is how you can almost achieve it but it's a combination of technology as well as policy you will implement.

1. Assign HDFS quota to directories where users can create table (data warehouse directory as well as external directories).

2. This quota combined should not be more than 25%.

3. Establish an organizational policy that Hive tables must be created only on above directories. If people create table outside of those directories, you should warn them that, that data will be deleted.

4. That's it. Enforce your policy.

Rising Star

@mqureshi Thank you. Make sense.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.