I find that the jobs in LLAP are failing when I enable the property run as end user instead of hive user.I get the below error.
[Code: 2, SQL State: 08S01] Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1507231420401_0045_1_00, diagnostics=[Vertex vertex_1507231420401_0045_1_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
... 25 more
Caused by: java.lang.IllegalArgumentException: No running LLAP daemons! Please check LLAP service status and zookeeper configuration
Does LLAP have any issue with this property being enabled? I could run the query on LLAP if I run it as hive user.
LLAP doesn't have any issue with it, it's simply ignored. So you can run your batch Hive instances in RunAs mode, while your Hive interactive (LLAP) server runs your jobs as hive user.
Your issue seems to be: "No running LLAP daemons!".
In order to run a job, you should first bring up the LLAP daemons cleanly. If that fails have a look at the LLAP daemon logs in YARN and check why those are not coming up or crashing.
Well. It does not seem to simply ignore it. The property hive.server2.enable.doAs is there in two places in Ambari,One in config-settings page of hive and other in advanced page(which belongs to Hive-Interactive server or LLAP). Yes you are right it ignores it if you set this property for hive,but if you set this property for LLAP it definitely takes effect(I tested by creating a table through LLAP and it creates it with end user).
I do not think LLAP daemons are down or crashing. I can monitor all my LLAP daemons and I also checked the status of LLAP,its in RUNNING_ALL state. If I just turn off this property everything runs normal on LLAP. We cannot run any query which starts mappers on LLAP if this property is set.
I can connect to LLAP and run simple queries which doesn't start a mapper like select * or create. But if I start any query which requires a mapper it fails with the above error.
I am assuming LLAP has a problem sharing its resources with multiple users. The error it is showing seems to be misdirecting.
@Stefan Kupstaitis-Dunkler can you please confirm if you are able to run mappers under LLAP by setting this property
hive.server2.enable.doAs=true in Advanced hive-interactive site section
You shouldn't set "hive.server2.enable.doAs=true" in the hive-interactive section. This doesn't make sense from a resources point of view. However you can set it on the main config page. These two settings are independent, even though they have the same name.
Either way you can access tables as the user you are authenticated against the Hiveserver and use fine grained authorization on your tables. The only difference is the user of the system processes running the queries. With hive.server2.enable.doAs=true the query runs in a container owned by the authenticated user, while "...=false" runs it as the hive user.
If I do not set hive.server2.enable.doAs=true in hive-interactive section(even if I set hive.server2.enable.doAs=true in hive main config page), It runs all my queries as hive user in LLAP and end user in hive CLI. That means I clearly cannot have impersonation enabled in LLAP until I set hive.server2.enable.doAs=true in hive-interactive section. And as you said if I set it to true it wont make sense from resource point of view. Thus there is clearly a conflict between running LLAP and Impersonation, In other words we cannot have impersonation in LLAP.
I never said I used hive CLI to connect to LLAP. I am connecting through beeline, we can give the username when we connect to LLAP through beeline and make it run through the (end)user through which we login to a beeline session. I have no issues regarding connecting to hiveserver2 or LLAP.
I just want someone from hortonworks development team to acknowledge that we cannot have impersonation using LLAP.
The process owner for Hive LLAP daemon process has to be the same as the query executing user. When Hive Server Interactive (aka HSI) is brought up by Ambari, the LLAP daemons are owned by user hive. The executing user will also need to be hive which is what happens when hive.server2.enable.doAs is false.
The intended usage of HSI is to be with impersonation disabled (hive.server2.enable.doAs=false). The end user authorization is assumed to be handled by Ranger Auth or SQL Standard Auth.