I tried to set up Impala to use YARN resource management. This requires (except from other things) to turn on Linux Container Execution (LCE) on all hots and
configuring YARN to use LCE.
The problem is, that when I tried to run a Spark job under root account YARN refused to do ths. First, it was an error message about nobody user.
Since YARN is by default configured to use this user, I changed the yarn.nodemanager.linux-container-executor.nonsecure-mode.local.user to false
in safety valve for yarn-site.xml.
Regarding the documentation this should enforce that every action in the container is executed under the user who submitted the job.
I tried to add root to the whitelist of allowed users in YARN (allowed.system.users) and setting min.user.id to 1, but nothng helped.
Yarn still is refusing to start a job under root.
yarn makes three checks ( source code ) :
For now the only workaround I found is to create a new user with UID and GID equal to 0 and insert the name of the user in white listed and set min user id to 0.
There is an important motivation to use root: if you need to use distcp on a target location that is an NFS filesystem or a sharable filesystem mounted local on the datanode/workernode to make a backup.
Infact in that case, if you run a job with a normal user, it's not possible to change the owner of the file, so the distcp backup will fails. Obviously if you run as root it will fail too for the hard coded control.