Support Questions

Find answers, ask questions, and share your expertise

PIG is not restricting authorization of HCatalog database and tables

avatar

Hi ,

We are verifying our product usecases over Ranger enabled HDP enviroment .

Our product launched from User A (LDAP user) . User A dont have access on any DB and Tables .

We have another User B (LDAP) . User B have access on marketingDb.saletable

When we logged in our product and use marketingDb.saletable and submit job so Job is getting success and Jobtracker is showing User A as user .

Question :- If job is launching from User A and User A dont have access on any HCatalog table so how job got successfully completed ?

To further debug this issue , we launched PIG job while User A keytab was in session so PIG job also successfully completed.

Could you please answer of these questions ... Is this happening due to any wrong configuration ..

Please guide us

1 ACCEPTED SOLUTION

avatar
Expert Contributor
@Piyush Jhawar

The Ranger Hive plugin protects Hive data when it is accessed via HiveServer2. When you access these tables using HCatalog in Pig you are not going through HiveServer2, but instead Pig is using the files directly from HDFS (HCatalog is just used to map the table metadata to the HDFS files in this case).

In order to protect this data, you should also define a Ranger HDFS policy to protect the underlying HDFS directory that is used to store the marketingDb.saletable data.

To clarify:

  • Ranger Hive Plugin - Used to protect Hive data when accessed via HiveServer2 (e.g, a user connecting to Hive via JDBC)
  • Ranger HDFS Plugin - Used to protect HDFS files and directories (suitable if users need to access the data outside of HiveServer2 - Pig, Spark etc)

View solution in original post

4 REPLIES 4

avatar
Expert Contributor
@Piyush Jhawar

The Ranger Hive plugin protects Hive data when it is accessed via HiveServer2. When you access these tables using HCatalog in Pig you are not going through HiveServer2, but instead Pig is using the files directly from HDFS (HCatalog is just used to map the table metadata to the HDFS files in this case).

In order to protect this data, you should also define a Ranger HDFS policy to protect the underlying HDFS directory that is used to store the marketingDb.saletable data.

To clarify:

  • Ranger Hive Plugin - Used to protect Hive data when accessed via HiveServer2 (e.g, a user connecting to Hive via JDBC)
  • Ranger HDFS Plugin - Used to protect HDFS files and directories (suitable if users need to access the data outside of HiveServer2 - Pig, Spark etc)

avatar

@Laurence Da Luz

Thank you very much for detailed answer .My doubt have been cleared now

avatar
Super Guru

@Piyush Jhawar

If @Laurence Da Luz answered your question, please accept the answer to help others in the community.

avatar

Yes i should do it..:)

Done it now..Thanks..