Support Questions

Find answers, ask questions, and share your expertise

Confusion about Hive & Impala behaviour for access to HDFS file with only HDFS user permissions

avatar
Explorer

I need some clarity on my chosen solution.

I have a CDH 5.3.9 cluster. After assigning roles, I wanted to add some custom UDF's to Hive and Impala. The .jar of the UDF is placed in /user/hdfs in HDFS. /user/hdfs has 700 for hdfs:supergroup

 

For Hive (NOT Hiveserver2), // Version 0.13

  • Login to Hive CLI as hdfs user
  • Execute create function statement

It works. It can access the .jar and create the function. I can test the UDF etc.

 

For Impala, // Version 2.1.7

  • Login to Impala CLI as hdfs user
  • Execute create function statement

It doesn't work since Impala doesnt have permissions to access /user/hdfs

 

If I add impala user to supergroup in Linux, it works since impala is added to HDFS superuser group

OR

If i give execute permissions to Other users on /user/hdfs

 

If I do a ps aux to see how the CLI is handled for Hive as well as Impala cases, I can see it being run as hdfs user (since I logged in as hdfs) so I assumed it should have access to /user/hdfs for impala as well. But looks like that is not sufficient for impala but works for Hive somehow.

 

Is it because for hive I am using a plain client? and that has access to /user/hdfs since user for login is hdfs?

Impala has to run via impalad which runs as impala user and that doesnt have access to the /user/hdfs

 

Can someone please clarify what is going on in here?

1 ACCEPTED SOLUTION

avatar
Champion

impalad daemon is the one that is not able to access the jar for query processing since you have set the hdfs permission as 700.  Your assumption is right and thats what I was refering in my previous post by stating  Impala does not support HDFS-level user impersonation.

View solution in original post

3 REPLIES 3

avatar
Champion

 When you run impala-shell  it would not run as "impala", it would run as the current user. Impala does not support HDFS-level user impersonation .if you need grandular level authorization / user permission you might want to use Sentry . 

 

please refer this link. 

https://www.cloudera.com/documentation/enterprise/5-2-x/topics/cm_sg_sentry_service.html

 

 

avatar
Explorer
Actually that is exactly the missing piece I am trying to figure out. I am aware that impala shell will run as whatever user I login as. hdfs user as in my case. However that is not sufficient for the impala shell to access a jar present in HDFS with 700 hdfs permissions. Where hive client shell which similarly runs the shell as hdfs user in my case is able to access. So I am assuming the impalad daemon running as impala user the cause of this ? Authorization is not what I am looking for.

avatar
Champion

impalad daemon is the one that is not able to access the jar for query processing since you have set the hdfs permission as 700.  Your assumption is right and thats what I was refering in my previous post by stating  Impala does not support HDFS-level user impersonation.