Created 12-02-2015 12:41 PM
Hi,
our Falcon installation abruptly ceased to work and no feed could be created. It complained about file permission
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=kefi, access=WRITE, inode="/apps/falcon-MiddleGate/staging/falcon/workflows/feed":middlegate_test1:falcon:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:179) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6515) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6497) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6449) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4251) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.ja
where 'kefi' is the user trying to create the feed and 'middlegate_test1' is another user that created some feed before.
the folders on hdfs looked like this
bash-4.1$ hadoop fs -ls /apps/falcon-MiddleGate/staging/falcon/workflows/ Found 2 items drwxr-xr-x - middlegate_test1 falcon 0 2015-12-02 09:13 /apps/falcon-MiddleGate/staging/falcon/workflows/feed drwxrwxrwx - middlegate_test1 falcon 0 2015-12-02 09:13 /apps/falcon-MiddleGate/staging/falcon/workflows/process
I can think of two questions related to this:
Thanks for any input,
Regards,
Pavel
Created 12-02-2015 02:28 PM
@Pavel Benes For Falcon to work, it is required that user must create the staging and working specified in the cluster entity before submitting the cluster entity. Staging directory must have permission 777 and working directory must have 755.
From the exception, it looks like that the required staging directory own by user "middlegate_test1" and user "kefi" could not able to write to that. To solve this issue, try to submit the cluster entity with user "kefi" to that cluster and then submit/schedule the feed/process entity.
For me when I submit the feed/process entity, permission of feed and process directory is 755 under staging directory that have permission 777. I don't think so that Falcon has done the permission change for feed/process, under staging directory, which usually contains the configuration xml's, logs file and jars files etc. required for executing the feed/process entity.
Created 12-02-2015 02:28 PM
@Pavel Benes For Falcon to work, it is required that user must create the staging and working specified in the cluster entity before submitting the cluster entity. Staging directory must have permission 777 and working directory must have 755.
From the exception, it looks like that the required staging directory own by user "middlegate_test1" and user "kefi" could not able to write to that. To solve this issue, try to submit the cluster entity with user "kefi" to that cluster and then submit/schedule the feed/process entity.
For me when I submit the feed/process entity, permission of feed and process directory is 755 under staging directory that have permission 777. I don't think so that Falcon has done the permission change for feed/process, under staging directory, which usually contains the configuration xml's, logs file and jars files etc. required for executing the feed/process entity.
Created 12-02-2015 02:35 PM
Thanks @peeyush
Created 12-02-2015 10:28 PM
The /apps/falcon-MiddleGate/staging/ and /apps/falcon-MiddleGate/working dirs are created when the cluster entity is submitted by the user. These dirs are used to store Falcon specific information, staging dir should have permissions of 777 and working dirs should have permissions of 755. Falcon expects that in real usecases, a falcon cluster entity is created by the admin and feed/process entities are created by the users of the cluster.
1. why the permission for the 'feed' folder is now 'drwxr-xr-x' ? -- Falcon creates <staging_dir>/falcon/workflows/feed and <staging_dir>/falcon/workflows/process only when a feed/process entity are scheduled. The owner of these dirs is the user scheduling the entity. The permissions are based on the default umask of the FS.
2. I am inclined to agree with you. <staging_dir>/falcon/workflows/process and <staging_dir>/falcon/workflows/feed should be created when cluster entity is submitted, and the ownership should belong to falcon, with perms 777. I created a Jira https://issues.apache.org/jira/browse/FALCON-1647 and I will update/resolve it after discussing with Falcon community.
The temporary workaround for this problem is to manually change the permissions of all dirs upto <staging_dir>/falcon/workflows/process and <staging_dir>/falcon/workflows/feed to 777 as @peeyush suggested.
Created 12-03-2015 09:59 AM
Thanks for filing the issue. I understand that the immediate cause of the failure are unsufficient hdfs permissions for the 'feed' folder. However I am puzzled about what triggered this. We were using the same Falcon instalation (both 'kefi' and 'middlegate_test1' users) for several weeks without problems. At the same time we have experienced problem with cluster/YARN overload since there were some processes running with minute(1) frequency. But I am not sure whether this could be related.
Created 12-03-2015 03:49 PM
My understanding is that cluster/YARN overload and this issue are not related.