Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What is the pros and cons to having hive jobs run as the hive user or the end user?

avatar
Contributor

Hi,

I am wondering what are the pros and cons to enabling or disabling "Run as end user instead of Hive user"?

Currently we are trying to implement proper permissions for our hadoop cluster to prevent any accidents etc.

My understanding is if a user logs into hiveserver2, and if they submit a query, and it runs as the hive user, then technically they should have access to all databases and tables etc, which can cause damage if not careful.

 

Since there is this setting, I would like to better understand the use case scenarios or pros and cons etc.

 

Thanks,

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hello @ryu 

 

The purpose of Ranger is to give the necessary user authentication to access the tables/database. If you allow a certain user access to a particular table/database, the user will be able to perform those actions on the table/database and the unwanted user automatically will not be able to remove the table.

 

Let say there are two users test1 and test2. If I allow test1 user to have access to table t1 and test2 user to have access to table t2, test1 will not be able to see table t2 and test2 will not be able to see test1.

 

You can further also add granularity as to what user can perform what actions on the table. This authorization is checked via Ranger hook which is present in the Hiveserver2. 

 

Let me know if the above answers your queries.

View solution in original post

6 REPLIES 6

avatar
Expert Contributor

Hello @ryu

 

 

If you run the job with the end user, you will eventually end up managing internal permissions, job submission permissions your self. Also you will find difficulty integrating things as per my experience.

 

But if you submit the job and let the hive user take care of the file creation, managing part in the backend, admin's job life become easier. You also will be able to hook/integrate things more properly.

 

the above was just a jist, recommendations are to authenticate using the end user but then keep the impersonation off and let the hive take care of things in the backend.

avatar
Contributor

Thanks @tusharkathpal  for the response.

But if the user is authenticated as the user but the job is run as the user "hive", does that user also get similar permissions as the hive user?

avatar
Expert Contributor

Hello @ryu 

 

Well you can make that user similar to hive user. hive user is mapped to hadoop group and you can make alterations to simulate a normal user to hive user but again as I mentioned earlier, you'll have to spend time managing it and eventually land up in spending more time in tshooting if things break. 

 

Remember hadoop is a complex setup with multiple components talking to each other. 🙂

avatar
Contributor

Thanks @tusharkathpal 

 

So if all users when running hive jobs are being ran as the "hive" user, is there a way to differentiate between different permissions?

For example, If I were to run a hive job and you were to run a hive job, would it be possible to have separate permissions between you and me so I would not be allowed to delete any tables you created and vice versa?

 

So we currently have Ranger installed in our hadoop cluster and my understanding is that the "run as end user" needs to be set to "false". So would Ranger know which user is running the job since all jobs are being run by the "hive" user.

 

Just want to make sure I am setting this up correctly.

 

Thanks,

avatar
Expert Contributor

Hello @ryu 

 

The purpose of Ranger is to give the necessary user authentication to access the tables/database. If you allow a certain user access to a particular table/database, the user will be able to perform those actions on the table/database and the unwanted user automatically will not be able to remove the table.

 

Let say there are two users test1 and test2. If I allow test1 user to have access to table t1 and test2 user to have access to table t2, test1 will not be able to see table t2 and test2 will not be able to see test1.

 

You can further also add granularity as to what user can perform what actions on the table. This authorization is checked via Ranger hook which is present in the Hiveserver2. 

 

Let me know if the above answers your queries.

avatar
Moderator

Please find more about the "pros and cons to having hive jobs run as the hive user or the end user" vs. Ranger in our public Cloudera documentation for CDP: Enabling or disabling impersonation (doas)


Ferenc Erdelyi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community: