When a YARN/MR Job is submitted it checks the staging directory ownership and if it doesn't matches with the user who is submitting the job, it throws below exception.
Staging directory path is referred from YARN config [yarn.app.mapreduce.am.staging-dir = /tmp/hadoop-yarn/staging]
java.io.IOException: The ownership on the staging directory /tmp/hadoop-yarn/staging/hdfs/.staging is not as expected. It is owned by . The directory must be owned by the submitter hdfs or hdfs at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:152) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:151) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588) at org.apache.hadoop.examples.WordCount.main(WordCount.java:87) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:318) at org.apache.hadoop.util.RunJar.main(RunJar.java:232)
Is there a YARN config which skips the ownership check for the staging directory.
I am facing this issue with OzoneFS, not with HDFS.
Ownership check happens in below file : https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-...
Any workaround to bypass or skip the check?
The problem seems to be because the FileStatus returned by OzoneFileSystem does not have the owner field set and so its empty. As a result the ownership check fails.
One workaround I see is to delete the /tmp/hadoop-yarn/staging/hdfs/.staging directory before submitting the Mapreduce job. Then this ownership check gets bypassed and the staging directory will be created again.
But this means that you can't have more than one job using the /tmp/hadoop-yarn/staging/hdfs/.staging directory. So its not a good workaround, although the only available one from what I see (Apart from code change in Mapreduce/Ozone) .
This workaround will mean that for each and every job I will have to delete the staging-dir before submitting any new job and also at a moment single user will be able to run a job.
Yes. Actually a user can only run a single job at any moment. To run multiple jobs at a moment, they all need to be submit as different users.
- you have to change the owner of the file
hadoop_amine@amine:/home/amine$ hadoop fs -chown -R hadoop_amine:hadoop_group /tmp/hadoop-yarn/staging/hadoop_amine/