Created on 08-22-2018 06:57 PM - edited 09-16-2022 06:37 AM
Hi,
HDFS and Hive Replication schedule is failing with beloew error.
18/08/22 19:40:02 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:PROXY) via hdfs/<FQDN>@realm (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException): org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load history file hdfs://nameservice9:8020/user/history/done_intermediate/hdfs/job_1534964122122_0006-1534978377784-hdfs-HdfsReplication-1534981192465-20-1-SUCCEEDED-root.users.hdfs-1534978383044.jhist
at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:199)
at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getJob(JobHistory.java:218)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:208)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:204)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:204)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getJobReport(HistoryClientService.java:236)
at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122)
at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2220)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2214)
Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load history file hdfs://nameservice9:8020/user/history/done_intermediate/hdfs/job_1534964122122_0006-1534978377784-hdfs-HdfsReplication-1534981192465-20-1-SUCCEEDED-root.users.hdfs-1534978383044.jhist
at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.loadFullHistoryData(CompletedJob.java:341)
at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.<init>(CompletedJob.java:101)
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo.loadJob(HistoryFileManager.java:479)
at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.loadJob(CachedHistoryStorage.java:180)
at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.access$000(CachedHistoryStorage.java:52)
at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:103)
at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:100)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
at com.google.common.cache.LocalCache$LocalManualCache.getUnchecked(LocalCache.java:4834)
at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:193)
... 18 more
Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=READ, inode="/user/history/done_intermediate/hdfs/job_1534964122122_0006-1534978377784-hdfs-HdfsReplication-1534981192465-20-1-SUCCEEDED-root.users.hdfs-1534978383044.jhist":hdfs:supergroup:-rwxrwx---
Created 08-23-2018 01:50 AM
This just says you dont have a permission with mapred user to that directory. Without knowing any details, one of the solution could be to add mapred user to a supergroup group on every worker node.
Created 08-23-2018 03:20 PM
Are you talking on destination cluster? Do you know how to add mapred user to supergroup? I don't see supergroup name at all in /etc/group. But I see mapred user in hadoop group. I need to add mapred user to supergroup as I am running BDR jobs as hdfs user.
Created 08-23-2018 03:44 AM
Created 08-23-2018 03:43 PM
Since we are running BDR jobs using hdfs user, I think this '/user/history' has been set to 'hdfs:supergroup'. I changed to mapred:hadoop and ran the job again but still it is failing with the same error. I see it creates another directoy inside '/user/history/done_intermediate/hdfs' with the name whoever runs a mapreduce job. It is writing .jhist, .summary, .xml extension files with below mentioned permissions and ownerships.
-rwxrwx--- 1 hdfs supergroup
Can we change anything in such a way that files its going to create in '/user/history/done_intermediate/hdfs' will have enough permissions to read/write.
Created 08-24-2018 12:30 AM
Created 08-24-2018 12:37 AM
As @GeKas said, enable ACL on HDFS (I you dont have it already - dfs.namenode.acls.enabled should be checked).
Then you need to set the default group access for the parent directory, so every other subdirectory will be accessible by the mapred user (assuming mapred is in hadoop group):
hdfs dfs -setfacl -R -m default:group:hadoop:r-x /user/historyh hdfs dfs -setfacl -R -m group:hadoop:r-x /user/history
And try it again.
Created 09-05-2018 01:57 PM
I triggered HDFS BDR job as hdfs user. It is failed with below error message.
INFO distcp.DistCp: map 96% reduce 0% files 99% bytes 80% throughput 24.7 (MB/s) remaining time 25 mins 27 secs running mappers 1
INFO distcp.DistCp: map 96% reduce 32% files 99% bytes 80% throughput 24.7 (MB/s) remaining time 25 mins 27 secs running mappers 1
INFO ipc.Client: Retrying connect to server: FQDN/IP:PORT. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
INFO ipc.Client: Retrying connect to server: FQDN/IP:PORT. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
INFO ipc.Client: Retrying connect to server: FQDN/IP:PORT. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
WARN security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:PROXY) via <hdfs_principal_name> (auth:KERBEROS) cause:java.io.IOException: Job status not available
ERROR util.DistCpUtils: Exception encountered
java.io.IOException: Job status not available
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:334)
at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:621)
at com.cloudera.enterprise.distcp.DistCp.checkProgress(DistCp.java:471)
at com.cloudera.enterprise.distcp.DistCp.execute(DistCp.java:461)
at com.cloudera.enterprise.distcp.DistCp$1.run(DistCp.java:151)
at com.cloudera.enterprise.distcp.DistCp$1.run(DistCp.java:148)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at com.cloudera.enterprise.distcp.DistCp.run(DistCp.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.cloudera.enterprise.distcp.DistCp.main(DistCp.java:843)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
18/09/05 16:32:09 INFO distcp.DistCp: Used diff: false
18/09/05 16:32:09 WARN distcp.DistCp: Killing submitted job job_1536157028747_0004
Created 09-06-2018 01:53 AM