Our set up has sentry enabled and recently added hive on spark as well.So after this it is very difficult to identify the application owner (always hive) and application query (always hive on spark)?
This is making life difficult in identifying and troubleshooting apps.
Any help on this or any alternate
Is sentry deployede in the cluster? If it is deployed then you have turned off user impersonation in Hive. At that point you need to look at the original user detail in the job configuration (Cloudera Manager will show it in the UI correctly). Look for the hive.sentry.subject.name property in the job config that is part of the job history.
If you do not have sentry you should have user impersonation turned on in Hive and the application runs as the end user.
@Wilfred Thx for the reply. First yes we have sentry enabled. I already explored
Hive Sentry Subject Name attribute in CM (we are on Cloudera 5.8.3) but for most of the jobs there is no value for that attirbute. Not sure from where it gets the user name and which condition it does not.
Difficulty is there with mapreduce also, but with Hive on spark it increased. As hive on MR atleast show the query details (select * ....) where as in HoS, owner is hive and application is hive on spark for all the jobs (difficult to troubleshoot when you 100/1000 of these kind of jobs).
I agree sentry is a very poor security managment component in Cloudera. With Sentry enabled Hive cannot impersonate the actual user who run the job. Administrators on Resource Manager UI, will have difficultty to figure out who ran the job. Apache Ranger is better product compare to Sentry.
If you are using clouder manager try use option "cluster --> yarn --> Applications" in this UI it provides the actual user who ran the job.