About MichelleLian

MichelleLian · ‎12-10-2018

Hi Nitish, Yes we have a non kerberized cluster. And your suggestion worked. I wonder why we are starting to see this problem now -- meaning why the bulk load will fail to load some of the HFiles to a single region and cause a split. There's no other writes to this table except from the bulk load from the map reduce job output. As far as I know the map reduce job should output HFiles which fit within a single region. Is it because our regions are almost full, or we have a configuration mismatch somewhere? I could try running major compaction and see if that changes anything... Thanks, Michelle

MichelleLian · ‎12-07-2018

We're using Hue schedule workflows with Oozie. The workflow has a step to run a map reduce job which outputs HFiles, then another step to call LoadIncrementalHFiles as a shell action. The workflow runs fine for a while, until recently we started to see error as below. The error only occurs when there's a region split. If no region split happens, then it executes fine. 18/12/06 02:42:58 INFO mapreduce.LoadIncrementalHFiles: HFile at hdfs://hdfs-hbaseprod/user/hbase/batch-output/f1/66022fd5b74c49cb8e6931e03ddc0625 no longer fits inside a single region. Splitting... 18/12/06 02:42:58 ERROR mapreduce.LoadIncrementalHFiles: IOException during splitting java.util.concurrent.ExecutionException: org.apache.hadoop.security.AccessControlException: Permission denied: user=yarn, access=WRITE, inode="/user/hbase/batch-output/f1":hbase:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6286) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6268) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6220) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4087) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4057) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4030) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:787) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplitPhase(LoadIncrementalHFiles.java:446) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:301) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:896) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:902) The HDFS directory is owned by hbase user, which is the same user who logged in to Hue to configure the workflow. I'm able to execute the command manually: sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hbase/batch-output/ 'sessiondb:threshold' If I run the command manually as yarn, I could get the same errors as in the workflow: sudo -u yarn hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hbase/batch-output/ 'sessiondb:threshold' My questions: 1. Why would this be executed as yarn instead of the hbase (user who logged in to Hue)? 2. How do I fix it? I could try changing Yarn's configuration of System User from yarn to hbase. But I'm afraid of repercussions that other things may be affected.

Online	Offline
Last Visited	‎02-21-2019 03:49 PM

Member Since	‎12-07-2018 02:20 PM
Last Visited	‎02-21-2019 03:49 PM
Posts	2

Cloudera Community

Re: Hue workflow error at LoadIncrementalHFiles: "...

Hue workflow error at LoadIncrementalHFiles: "IOEx...