Created on 03-31-2015 04:26 AM - edited 09-16-2022 02:25 AM
Hi,
I am having some problems with YARN and it is not the first clusters where this happens, so I don't get what I am doing wrong. Every night I shut down the clusters (installed on AWS and SoftLayer) to not spend money while not working. Also, sooner or later I need bigger machines, so I change the AWS instance type (similar name also for SoftLayer). What happens in a not very clear moment is that after a particular restart YARN generates problems in the NodeManager user cache directory (e.g. /bigdata1/yarn/nm/usercache/m.giusto), like in this case (https://community.cloudera.com/t5/Data-Ingestion-Integration/Sqoop-Error-Sqoop-2-not-working-through... and I am forced to remove everything from all the user cache directories (acceptable) otherwise jobs are unable to start.
However the bigger problem is that YARN also starts applying a not desired rule for which each user that submit a job is considered not allowed and YARN starts the job as "nobody" (yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user default value). This happens for a not super-user like "m.giusto" (UID over 1000) and also for "hdfs" (UID less than 500). I have tried to move "hdfs" from "banned.users" to "allowed.system.users" and to set "min.user.id" to 0 , no changes. Moreover "nobody" user is not able to write to the real-user user cache folder (permission denied) and so the job fails.
main : user is nobody
main : requested yarn user is m.giusto
Can't create directory /bigdata1/yarn/nm/usercache/m.giusto/appcache/application_1427799738120_0001 - Permission denied
Can't create directory /bigdata2/yarn/nm/usercache/m.giusto/appcache/application_1427799738120_0001 - Permission denied
Did not create any app directories
.Failing this attempt.. Failing the application.
What I do not get is why the system starts applying these rules and how to fix. At the moment the only solution is to reinstall the cluster..
Some other infos: OS is Centos6.6, tested CDH version are 5.2.1, 5.3.1 and 5.3.2.
Thanks,
Michele