Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive Database Size

Solved Go to solution
Highlighted

Hive Database Size

Rising Star

Is there any way I can restrict a Hive database to use defined space? Or any property to define a space quota for a database in hive. I want to restrict a database to use only 25% of my HDFS storage not more than that.

I am exploring other option where I can restrict hdfs folder which will be the location of db.

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Hive Database Size

Super Guru

@Shashant Panwar

You can set HDFS quota. @mbalakrishnan is right. It really doesn't work at Hive level. Think about this. You set quota at directory level. Assume that you set your hive warehouse directory to be 25% of HDFS storage. So what. I'll just create an external table. You restrict that external directory, I can create another directory and point my external table to it. So, here is how you can almost achieve it but it's a combination of technology as well as policy you will implement.

1. Assign HDFS quota to directories where users can create table (data warehouse directory as well as external directories).

2. This quota combined should not be more than 25%.

3. Establish an organizational policy that Hive tables must be created only on above directories. If people create table outside of those directories, you should warn them that, that data will be deleted.

4. That's it. Enforce your policy.

View solution in original post

7 REPLIES 7
Highlighted

Re: Hive Database Size

Rising Star

Update: I checked Ranger as well but did not find any setup related to space limitation.

Highlighted

Re: Hive Database Size

Expert Contributor
@Shashant Panwar

I dont understand the usecase of limiting a database to use a X percent. Lets assume you have a table which needs more than 25% of the disk space then you wont be able to do anything unless you delete and free up the space on that DB to allow data into the table.

Also one cannot set limitation at Hive level to state how much space a database can use. It needs to be set at HDFS user level quota.

-Mahesh

Highlighted

Re: Hive Database Size

Rising Star

Thank you @mbalakrishnan. A simple use case is to limit the space for different teams. I do not want one team to utilize the entire cluster space. For eg: Team A 25%, Team B 25% and Team C 50%. Something like YARN Capacity queues. Space quota per team for cost management.

I will check the same for HDFS. Can you refer some links to check HDFS user level quota?

Highlighted

Re: Hive Database Size

Expert Contributor

Re: Hive Database Size

Rising Star

Thank you @mbalakrishnan

Highlighted

Re: Hive Database Size

Super Guru

@Shashant Panwar

You can set HDFS quota. @mbalakrishnan is right. It really doesn't work at Hive level. Think about this. You set quota at directory level. Assume that you set your hive warehouse directory to be 25% of HDFS storage. So what. I'll just create an external table. You restrict that external directory, I can create another directory and point my external table to it. So, here is how you can almost achieve it but it's a combination of technology as well as policy you will implement.

1. Assign HDFS quota to directories where users can create table (data warehouse directory as well as external directories).

2. This quota combined should not be more than 25%.

3. Establish an organizational policy that Hive tables must be created only on above directories. If people create table outside of those directories, you should warn them that, that data will be deleted.

4. That's it. Enforce your policy.

View solution in original post

Highlighted

Re: Hive Database Size

Rising Star

@mqureshi Thank you. Make sense.

Don't have an account?
Coming from Hortonworks? Activate your account here