Member since
12-07-2018
2
Posts
0
Kudos Received
0
Solutions
12-10-2018
06:34 PM
Hi Nitish, Yes we have a non kerberized cluster. And your suggestion worked. I wonder why we are starting to see this problem now -- meaning why the bulk load will fail to load some of the HFiles to a single region and cause a split. There's no other writes to this table except from the bulk load from the map reduce job output. As far as I know the map reduce job should output HFiles which fit within a single region. Is it because our regions are almost full, or we have a configuration mismatch somewhere? I could try running major compaction and see if that changes anything... Thanks, Michelle
... View more
12-07-2018
03:11 PM
We're using Hue schedule workflows with Oozie. The workflow has a step to run a map reduce job which outputs HFiles, then another step to call LoadIncrementalHFiles as a shell action. The workflow runs fine for a while, until recently we started to see error as below. The error only occurs when there's a region split. If no region split happens, then it executes fine.
18/12/06 02:42:58 INFO mapreduce.LoadIncrementalHFiles: HFile at hdfs://hdfs-hbaseprod/user/hbase/batch-output/f1/66022fd5b74c49cb8e6931e03ddc0625 no longer fits inside a single region. Splitting...
18/12/06 02:42:58 ERROR mapreduce.LoadIncrementalHFiles: IOException during splitting
java.util.concurrent.ExecutionException: org.apache.hadoop.security.AccessControlException: Permission denied: user=yarn, access=WRITE, inode="/user/hbase/batch-output/f1":hbase:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6286)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6268)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6220)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4087)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4057)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4030)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:787)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplitPhase(LoadIncrementalHFiles.java:446)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:301)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:896)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:902)
The HDFS directory is owned by hbase user, which is the same user who logged in to Hue to configure the workflow. I'm able to execute the command manually:
sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hbase/batch-output/ 'sessiondb:threshold'
If I run the command manually as yarn, I could get the same errors as in the workflow:
sudo -u yarn hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hbase/batch-output/ 'sessiondb:threshold'
My questions:
1. Why would this be executed as yarn instead of the hbase (user who logged in to Hue)?
2. How do I fix it? I could try changing Yarn's configuration of System User from yarn to hbase. But I'm afraid of repercussions that other things may be affected.
... View more
Labels: