- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HDFS or HIVE Replication
- Labels:
-
Apache Hive
-
Apache YARN
-
HDFS
-
Kerberos
-
MapReduce
Created on ‎08-22-2018 06:57 PM - edited ‎09-16-2022 06:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
HDFS and Hive Replication schedule is failing with beloew error.
18/08/22 19:40:02 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:PROXY) via hdfs/<FQDN>@realm (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException): org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load history file hdfs://nameservice9:8020/user/history/done_intermediate/hdfs/job_1534964122122_0006-1534978377784-hdfs-HdfsReplication-1534981192465-20-1-SUCCEEDED-root.users.hdfs-1534978383044.jhist
at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:199)
at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getJob(JobHistory.java:218)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:208)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler$1.run(HistoryClientService.java:204)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:204)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getJobReport(HistoryClientService.java:236)
at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122)
at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2220)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2214)
Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load history file hdfs://nameservice9:8020/user/history/done_intermediate/hdfs/job_1534964122122_0006-1534978377784-hdfs-HdfsReplication-1534981192465-20-1-SUCCEEDED-root.users.hdfs-1534978383044.jhist
at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.loadFullHistoryData(CompletedJob.java:341)
at org.apache.hadoop.mapreduce.v2.hs.CompletedJob.<init>(CompletedJob.java:101)
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo.loadJob(HistoryFileManager.java:479)
at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.loadJob(CachedHistoryStorage.java:180)
at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.access$000(CachedHistoryStorage.java:52)
at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:103)
at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:100)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
at com.google.common.cache.LocalCache$LocalManualCache.getUnchecked(LocalCache.java:4834)
at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:193)
... 18 more
Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=READ, inode="/user/history/done_intermediate/hdfs/job_1534964122122_0006-1534978377784-hdfs-HdfsReplication-1534981192465-20-1-SUCCEEDED-root.users.hdfs-1534978383044.jhist":hdfs:supergroup:-rwxrwx---
Created ‎08-23-2018 01:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This just says you dont have a permission with mapred user to that directory. Without knowing any details, one of the solution could be to add mapred user to a supergroup group on every worker node.
Created ‎08-23-2018 03:20 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you talking on destination cluster? Do you know how to add mapred user to supergroup? I don't see supergroup name at all in /etc/group. But I see mapred user in hadoop group. I need to add mapred user to supergroup as I am running BDR jobs as hdfs user.
Created ‎08-23-2018 03:44 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎08-23-2018 03:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since we are running BDR jobs using hdfs user, I think this '/user/history' has been set to 'hdfs:supergroup'. I changed to mapred:hadoop and ran the job again but still it is failing with the same error. I see it creates another directoy inside '/user/history/done_intermediate/hdfs' with the name whoever runs a mapreduce job. It is writing .jhist, .summary, .xml extension files with below mentioned permissions and ownerships.
-rwxrwx--- 1 hdfs supergroup
Can we change anything in such a way that files its going to create in '/user/history/done_intermediate/hdfs' will have enough permissions to read/write.
Created ‎08-24-2018 12:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎08-24-2018 12:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As @GeKas said, enable ACL on HDFS (I you dont have it already - dfs.namenode.acls.enabled should be checked).
Then you need to set the default group access for the parent directory, so every other subdirectory will be accessible by the mapred user (assuming mapred is in hadoop group):
hdfs dfs -setfacl -R -m default:group:hadoop:r-x /user/historyh hdfs dfs -setfacl -R -m group:hadoop:r-x /user/history
And try it again.
Created ‎09-05-2018 01:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I triggered HDFS BDR job as hdfs user. It is failed with below error message.
INFO distcp.DistCp: map 96% reduce 0% files 99% bytes 80% throughput 24.7 (MB/s) remaining time 25 mins 27 secs running mappers 1
INFO distcp.DistCp: map 96% reduce 32% files 99% bytes 80% throughput 24.7 (MB/s) remaining time 25 mins 27 secs running mappers 1
INFO ipc.Client: Retrying connect to server: FQDN/IP:PORT. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
INFO ipc.Client: Retrying connect to server: FQDN/IP:PORT. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
INFO ipc.Client: Retrying connect to server: FQDN/IP:PORT. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
WARN security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:PROXY) via <hdfs_principal_name> (auth:KERBEROS) cause:java.io.IOException: Job status not available
ERROR util.DistCpUtils: Exception encountered
java.io.IOException: Job status not available
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:334)
at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:621)
at com.cloudera.enterprise.distcp.DistCp.checkProgress(DistCp.java:471)
at com.cloudera.enterprise.distcp.DistCp.execute(DistCp.java:461)
at com.cloudera.enterprise.distcp.DistCp$1.run(DistCp.java:151)
at com.cloudera.enterprise.distcp.DistCp$1.run(DistCp.java:148)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at com.cloudera.enterprise.distcp.DistCp.run(DistCp.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.cloudera.enterprise.distcp.DistCp.main(DistCp.java:843)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
18/09/05 16:32:09 INFO distcp.DistCp: Used diff: false
18/09/05 16:32:09 WARN distcp.DistCp: Killing submitted job job_1536157028747_0004
Created ‎09-06-2018 01:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please ensure/perform the following items for JHS and job completions to thoroughly work:
First: Ensure that 'mapred' and 'yarn' are part of the 'hadoop' group in common:
~> hdfs groups mapred
~> hdfs groups yarn
Both command must include 'hadoop' in their outputs. If not, ensure they are added to that group name.
Second, all files and directories under HDFS /tmp/logs aggregation dir (or whatever you've reconfigured it to use) and /user/history/* have their group set to 'hadoop' and not anything else:
~> hadoop fs -chgrp -R hadoop /user/history /tmp/logs
~> hadoop fs -chmod -R g+rwx /user/history /tmp/logs
Note: ACLs suggested earlier are not required to resolve this problem. The group used on these dirs is what matters in the default state, and the group setup described above is how YARN and JHS daemon users share information and responsibilities with each other. You may remove any ACLs set, or leave them be as they are still permissive.
