10-11-2016 04:06 AM
I need some clarity on my chosen solution.
I have a CDH 5.3.9 cluster. After assigning roles, I wanted to add some custom UDF's to Hive and Impala. The .jar of the UDF is placed in /user/hdfs in HDFS. /user/hdfs has 700 for hdfs:supergroup
For Hive (NOT Hiveserver2), // Version 0.13
It works. It can access the .jar and create the function. I can test the UDF etc.
For Impala, // Version 2.1.7
It doesn't work since Impala doesnt have permissions to access /user/hdfs
If I add impala user to supergroup in Linux, it works since impala is added to HDFS superuser group
If i give execute permissions to Other users on /user/hdfs
If I do a ps aux to see how the CLI is handled for Hive as well as Impala cases, I can see it being run as hdfs user (since I logged in as hdfs) so I assumed it should have access to /user/hdfs for impala as well. But looks like that is not sufficient for impala but works for Hive somehow.
Is it because for hive I am using a plain client? and that has access to /user/hdfs since user for login is hdfs?
Impala has to run via impalad which runs as impala user and that doesnt have access to the /user/hdfs
Can someone please clarify what is going on in here?
10-12-2016 07:22 AM - edited 10-12-2016 07:24 AM
When you run impala-shell it would not run as "impala", it would run as the current user. Impala does not support HDFS-level user impersonation .if you need grandular level authorization / user permission you might want to use Sentry .
please refer this link.
10-12-2016 10:45 AM
10-12-2016 09:07 PM
impalad daemon is the one that is not able to access the jar for query processing since you have set the hdfs permission as 700. Your assumption is right and thats what I was refering in my previous post by stating Impala does not support HDFS-level user impersonation.