Created on 08-01-2014 08:34 AM - edited 09-16-2022 02:04 AM
When I try to start the job traker using this command
service hadoop-0.20-mapreduce-jobtracker start
I can see this error
org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:149) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4891) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4873) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4847) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3192) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3156) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3137) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:669)
I found this blog post which tries to address this issue
http://blog.spryinc.com/2013/06/hdfs-permissions-overcoming-permission.html
I followed the steps here and did
groupadd supergroup usermod -a -G supergroup mapred usermod -a -G supergroup hdfs
but i still get this problem. The only different between the blog entry and me is that for me the error is on the "root" dir whereas for the blog it is for the "/user"
Here is my mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapred.job.tracker</name> <value>jt1:8021</value> </property> <property> <name>mapred.local.dir</name> <value>/tmp/mapred/jt</value> </property> <property> <name>mapred.system.dir</name> <value>/tmp/mapred/system</value> </property> <property> <name>mapreduce.jobtracker.staging.root.dir</name> <value>/user</value> </property> <property> <name>mapred.job.tracker.persist.jobstatus.active</name> <value>true</value> </property> <property> <name>mapred.job.tracker.persist.jobstatus.hours</name> <value>24</value> </property> <property> <name>mapred.jobtracker.taskScheduler</name> <value>org.apache.hadoop.mapred.FairScheduler</value> </property> <property> <name>mapred.fairscheduler.poolnameproperty</name> <value>user.name</value> </property> <property> <name>mapred.fairscheduler.allocation.file</name> <value>/etc/hadoop/conf/fair-scheduler.xml</value> </property> <property> <name>mapred.fairscheduler.allow.undeclared.pools</name> <value>true</value> </property> </configuration>
I also found this blog
I did
sudo -u hdfs hdfs dfs -mkdir /home
sudo -u hdfs hdfs dfs -chown mapred:mapred /home
sudo -u hdfs hdfs dfs -mkdir /home/mapred
sudo -u hdfs hdfs dfs -chown mapred /home/mapred
sudo -u hdfs hdfs dfs -chown hdfs:supergroup /
but still problem is not resolved 😞 Please help.
I wonder why it is going for the "root" dir inode="/":hdfs:supergroup:drwxr-xr-x
Created on 08-04-2014 01:12 PM - edited 08-04-2014 01:12 PM
The error indicates that mapreduce wants to be able to write to /. you have the owner as hdfs with rwx, you have groups with r-x, and others set to r-x. Since you added mapred to the groups membership earlier by adding it to supergroup and making supergroup the group for / it is the group level permissions that we will need to modify.
To get it working you can do the following:
sudo -u hdfs hdfs dfs -chmod 775 /
this will change the permissions on / to drwxrwxr-x
as for why mapreduce is trying to write to / it may be that it's trying to create /user and /tmp that you have defined as the user space and the temporary space. if you don't have those directories you could instead do the following:
sudo -u hdfs hdfs dfs -mkdir /user
sudo -u hdfs hdfs dfs -chown mapred:mapred /user
sudo -u hdfs hdfs dfs -mkdir /tmp
sudo -u hdfs hdfs dfs -chown mapred:mapred /tmp
Created 03-13-2017 02:05 PM
I have some strange issue going on with spark jobs.
CDH 5.8.3.
Even hive on spark jobs.
Job seems to run successfully. While the job is running i can go thorough the Resource manager to application master which leads me to spark execution web UI.
But after the job finishes, even though the job is moved to Job history server, when i click on histroy server webui it doesnt take me to spark history web UI.
Instead the job remains under /tmp/logs/user/logs/applicationid
eg. drwxrwx--- - bigdata hadoop 0 2017-03-13 15:26 /tmp/logs/bigdata/logs/application_1489248168306_0076
drwxrwxrwt+ - mapred hadoop 0 2017-03-09 17:39 /tmp/logs
Permissions for /tmp is 1777
/user/bigdata is 755
drwxrwx---+ - mapred hadoop 0 2017-01-03 13:09 /user/history/done
drwxrwxrwt+ - mapred hadoop 0 2017-03-09 17:39 /user/history/done_intermediate
uid=489(mapred) gid=486(mapred) groups=486(mapred),493(hadoop)
uid=517(bigdata) gid=522(bigdata) groups=522(bigdata),528(hdpdev)
hadoop:x:493:hdfs,mapred,yarn
All of below is done. https://www.cloudera.com/documentation/enterprise/5-4-x/topics/admin_spark_history_server.html
$ sudo -u hdfs hadoop fs -mkdir /user/spark $ sudo -u hdfs hadoop fs -mkdir /user/spark/applicationHistory $ sudo -u hdfs hadoop fs -chown -R spark:spark /user/spark $ sudo -u hdfs hadoop fs -chmod 1777 /user/spark/applicationHistory
spark.eventLog.dir=/user/spark/applicationHistory spark.eventLog.enabled=true
Not sure whats going on here. Everything seems to be in order.
Surprisingly for some of the jobs i was able to be redirected from job history server to spark history server.
Created 03-15-2017 06:16 PM
Figuered out the issue.
The issue was we were passing a spark.conf file while submitting the spark job hoping the config changes would be aggregated with default parameters from default spark.conf.
Turns out it overrides the default spark config file. Even if you pass blank spark conf it will not consider the default spark.conf for the job.
We had to below 3 lines on the custom spark conf file to enable log aggregation at spark history server and URL at resource manager to point to spark history server.
This has to be done with every spark job. If a job is submitted with below 3 parms it will not be available in spark history server even if u restart anything.
```spark.eventLog.enabled=true
spark.eventLog.dir=hdfs://nameservice1/user/spark/applicationHistory
spark.yarn.historyServer.address=http://sparkhist-dev.visibleworld.com:18088```
Created 04-04-2017 01:48 AM
how did you solve it Max?
Created 04-04-2017 01:50 AM