Support Questions

Find answers, ask questions, and share your expertise

Hive Database Size

avatar
Rising Star

Is there any way I can restrict a Hive database to use defined space? Or any property to define a space quota for a database in hive. I want to restrict a database to use only 25% of my HDFS storage not more than that.

I am exploring other option where I can restrict hdfs folder which will be the location of db.

Thank you.

1 ACCEPTED SOLUTION

avatar
Super Guru

@Shashant Panwar

You can set HDFS quota. @mbalakrishnan is right. It really doesn't work at Hive level. Think about this. You set quota at directory level. Assume that you set your hive warehouse directory to be 25% of HDFS storage. So what. I'll just create an external table. You restrict that external directory, I can create another directory and point my external table to it. So, here is how you can almost achieve it but it's a combination of technology as well as policy you will implement.

1. Assign HDFS quota to directories where users can create table (data warehouse directory as well as external directories).

2. This quota combined should not be more than 25%.

3. Establish an organizational policy that Hive tables must be created only on above directories. If people create table outside of those directories, you should warn them that, that data will be deleted.

4. That's it. Enforce your policy.

View solution in original post

7 REPLIES 7

avatar
Rising Star

Update: I checked Ranger as well but did not find any setup related to space limitation.

avatar
Expert Contributor
@Shashant Panwar

I dont understand the usecase of limiting a database to use a X percent. Lets assume you have a table which needs more than 25% of the disk space then you wont be able to do anything unless you delete and free up the space on that DB to allow data into the table.

Also one cannot set limitation at Hive level to state how much space a database can use. It needs to be set at HDFS user level quota.

-Mahesh

avatar
Rising Star

Thank you @mbalakrishnan. A simple use case is to limit the space for different teams. I do not want one team to utilize the entire cluster space. For eg: Team A 25%, Team B 25% and Team C 50%. Something like YARN Capacity queues. Space quota per team for cost management.

I will check the same for HDFS. Can you refer some links to check HDFS user level quota?

avatar
Expert Contributor

avatar
Rising Star

Thank you @mbalakrishnan

avatar
Super Guru

@Shashant Panwar

You can set HDFS quota. @mbalakrishnan is right. It really doesn't work at Hive level. Think about this. You set quota at directory level. Assume that you set your hive warehouse directory to be 25% of HDFS storage. So what. I'll just create an external table. You restrict that external directory, I can create another directory and point my external table to it. So, here is how you can almost achieve it but it's a combination of technology as well as policy you will implement.

1. Assign HDFS quota to directories where users can create table (data warehouse directory as well as external directories).

2. This quota combined should not be more than 25%.

3. Establish an organizational policy that Hive tables must be created only on above directories. If people create table outside of those directories, you should warn them that, that data will be deleted.

4. That's it. Enforce your policy.

avatar
Rising Star

@mqureshi Thank you. Make sense.