Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can't create directory /yarn/nm/usercache/urika/appcache/application_1 - Permission denied

avatar
Explorer

Try to run a simple test and get permissioned denied errors; tried as both root and urika user.

 

Just enabled kerberos...

 

[root@skipper4 cloudera-scm-server]# hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 100
Number of Maps = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
15/02/21 03:11:40 INFO client.RMProxy: Connecting to ResourceManager at skipper4/10.0.1.4:8032
15/02/21 03:11:40 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 4 for urika on 10.0.1.4:8020
15/02/21 03:11:40 INFO security.TokenCache: Got dt for hdfs://skipper4:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 10.0.1.4:8020, Ident: (HDFS_DELEGATION_TOKEN token 4 for urika)
15/02/21 03:11:41 INFO input.FileInputFormat: Total input paths to process : 10
15/02/21 03:11:41 INFO mapreduce.JobSubmitter: number of splits:10
15/02/21 03:11:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1424508393097_0004
15/02/21 03:11:41 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: 10.0.1.4:8020, Ident: (HDFS_DELEGATION_TOKEN token 4 for urika)
15/02/21 03:11:41 INFO impl.YarnClientImpl: Submitted application application_1424508393097_0004
15/02/21 03:11:41 INFO mapreduce.Job: The url to track the job: http://skipper4:8088/proxy/application_1424508393097_0004/
15/02/21 03:11:41 INFO mapreduce.Job: Running job: job_1424508393097_0004
15/02/21 03:11:56 INFO mapreduce.Job: Job job_1424508393097_0004 running in uber mode : false
15/02/21 03:11:56 INFO mapreduce.Job: map 0% reduce 0%
15/02/21 03:11:56 INFO mapreduce.Job: Job job_1424508393097_0004 failed with state FAILED due to: Application application_1424508393097_0004 failed 2 times due to AM Container for appattempt_1424508393097_0004_000002 exited with exitCode: -1000 due to: Application application_1424508393097_0004 initialization failed (exitCode=255) with output: main : command provided 0
main : user is urika
main : requested yarn user is urika
Can't create directory /mnt/ssd/yarn/nm/usercache/urika/appcache/application_1424508393097_0004 - Permission denied
Did not create any app directories

.Failing this attempt.. Failing the application.
15/02/21 03:11:56 INFO mapreduce.Job: Counters: 0
Job Finished in 15.543 seconds
java.io.FileNotFoundException: File does not exist: hdfs://skipper4:8020/user/urika/QuasiMonteCarlo_1424509895729_44418568/out/reduce-out
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1749)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

1 ACCEPTED SOLUTION

avatar
Explorer

Fix was to remove or move the urika cache directory from all the nodes with (computes in my case).  Seems that these directories will get re-created during a run.

 

Bug when you go from simlpe AUTH to kerberos AUTH; the cache directories will not work if created under simple AUTH.

View solution in original post

7 REPLIES 7

avatar
Mentor
Could you post the outputs of the below, from your NodeManager machines where the attempt appears to have failed to run?:

ls -ld /mnt/ssd/yarn/nm
ls -ld /mnt/ssd/yarn/nm/usercache
ls -ld /mnt/ssd/yarn/nm/usercache/urika
ls -l /mnt/ssd/yarn/nm/usercache/urika

avatar
Explorer

Fix was to remove or move the urika cache directory from all the nodes with (computes in my case).  Seems that these directories will get re-created during a run.

 

Bug when you go from simlpe AUTH to kerberos AUTH; the cache directories will not work if created under simple AUTH.

avatar
New Contributor

Deleting the directory from datanodes did work for me, thanks!

avatar
New Contributor

You need to apply this command

 

rm -rf /dn/yarn/nm/usercache/*  { this is my configuration }

 

Please check you configuration inside YARN (MR2 Included) NodeManager Local Directories . 

 

http://i.imgur.com/BHwhUnB.jpg

 

You need to apply this to data nodes which reported error by the YARN

 

This is sample of my case 

 

http://i.imgur.com/miNx454.jpg

 

ApplicationMaster reported that  C90BFH04.localdomain:8042 , which is data node no 4 . So i applied only to the YARN directory in Node no 4

 

After that everything is OK !

 

avatar
Contributor

I've been seeing the same problem in different environments

 

the owner of the `/data/disk-$id/yarn/nm/usercache/<user>` directory is not `<user>` but rather `keytrustee` .

Subsequent containers that get assigned to this NodeManager then fail with a permission error: that <user> doesnt have the permission to write into a directory owned by `keytrustee`. 

 

Any idea how YARN decides who owns this directory? Is it the process uid that runs the container?

 

It is mentioned above that certain configuration changes require those directories to be cleared / re-initialized. What configuration changes are those? Just moving from an unkerberized to a kerberized cluster?

 

Furthermore, is there a tool that can be used to clear those directories?

 

thanks!

avatar
Contributor
to add to that - does changing the container executor to org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor require clearing those directories too?

avatar
Contributor

@jbowles 

 

Yes, it is advisable to clean up the NM local directories when changing LCE setting, please see https://www.cloudera.com/documentation/enterprise/5-10-x/topics/cdh_sg_other_hadoop_security.html#to...

 

Important: Configuration changes to the Linux container executor could result in local NodeManager directories (such as usercache) being left with incorrect permissions. To avoid this, when making changes using either Cloudera Manager or the command line, first manually remove the existing NodeManager local directories from all configured local directories (yarn.nodemanager.local-dirs), and let the NodeManager recreate the directory structure.