Created on 02-17-2020 11:08 PM - last edited on 02-17-2020 11:21 PM by VidyaSargur
Hello Team,
I have anabled MIT-Kerberos and integrated my cluster, Initialized the principals for hdfs, hbase and yarn.
Able to access the hdfs and hbase tables.
But when i am trying to run sample mapreduce job its getting failed, Find below error logs.
==> yarn jar /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/hadoop-examples.jar teragen 500000000 /tmp/teragen2
Logs:
WARN security.UserGroupInformation: PriviledgedActionException as:HTTP/hostname.org@FQDN.COM (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=HTTP, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=HTTP, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x
hostname.org:~:HADOOP QA]$ klist
Ticket cache: FILE:/tmp/krb5cc_251473
Default principal: HTTP/hostname.org@FQDN.COM
Valid starting Expires Service principal
02/18/20 01:55:32 02/19/20 01:55:32 krbtgt/FQDN.COM@FQDN.COM
renew until 02/23/20 01:55:32
Can some one please check the issue and help us.
Thanks & Regards,
Vinod
Created 02-17-2020 11:55 PM
The klist result shows you are submitting job as HTTP user
hostname.org:~:HADOOP QA]$ klist
Ticket cache: FILE:/tmp/krb5cc_251473
Default principal: HTTP/hostname.org@FQDN.COM
WARN security.UserGroupInformation: PriviledgedActionException as:HTTP/hostname.org@FQDN.COM (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=HTTP, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x
The above error just implies you don't have write permission for HTTP user on /user directory. So you can either provide write permission for "others" for /user in hdfs so that HTTP user can write or run the job after you kinit as user mcaf which has write permission
Created 02-18-2020 02:33 AM
Hello @venkatsambath
Thank you for your response...!!
Actually we use mcaf as a user to execute the jobs but why http user coming to the picture ?
hostname.com:~:HADOOP QA]$ groups
mcaf supergroup
hostname.com:~:HADOOP QA]$ users
mcaf
hostname.com:~:HADOOP QA]$ hadoop fs -ls /
Found 4 items
drwx------ - hbase supergroup 0 2020-02-18 02:46 /hbase
drwxr-xr-x - hdfs supergroup 0 2015-02-04 11:44 /system
drwxrwxrwt - hdfs supergroup 0 2020-02-17 05:07 /tmp
drwxr-xr-x - mcaf supergroup 0 2019-03-28 03:12 /user
hostname.com:~:HADOOP QA]$ getent group supergroup
supergroup:x:25290:hbase,mcaf,zookeeper,hdfs
hostname.com:~:HADOOP QA]$ getent group hadoop
hadoop:x:497:mapred,yarn,hdfs
Can you please have a look and suggest me what to do?
Note: I am trying to enable Kerberos and once it is running with out any interrupt or with out any issues, then we are planing to integrate with AD.
Thanks,
Vinod
Created 02-18-2020 07:42 PM
Actually we use mcaf as a user to execute the jobs but why http user coming to the picture ?
--> By this do you mean, you switch to mcaf unix user[su - mcaf] and then run job? If yes, then its wrong. Post enabling kerberos hdfs and yarn recognises the user by the tgt and not by unix user id. So even if you su to mcaf and then have tgt as different user[say HTTP]. then yarn/hdfs recognises you by that tgt user.
Can you kinit mcaf, then run klist[to ensure you have mcaf tgt] and submit the job?
Created 02-18-2020 09:55 PM
First verified whether i am able to access hdfs before doing "kinit mcaf" and its failed to access.
Now i did kinit mcaf and verified hdfs access and able to list the files and able to create a directories.
Now i tried triggered sample yarn job,
hostname.com:~:HADOOP QA]$ yarn jar /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/hadoop-examples.jar teragen 500000000 /tmp/teragen4
20/02/19 00:46:30 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager/IP_ADDRESS:8032
20/02/19 00:46:30 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 8 for mcaf on ha-hdfs:nameservice1
20/02/19 00:46:30 INFO security.TokenCache: Got dt for hdfs://nameservice1; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (HDFS_DELEGATION_TOKEN token 8 for mcaf)
20/02/19 00:46:31 INFO terasort.TeraSort: Generating 500000000 using 2
20/02/19 00:46:31 INFO mapreduce.JobSubmitter: number of splits:2
20/02/19 00:46:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1582090413480_0002
20/02/19 00:46:31 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (HDFS_DELEGATION_TOKEN token 8 for mcaf)
20/02/19 00:46:32 INFO impl.YarnClientImpl: Submitted application application_1582090413480_0002
20/02/19 00:46:32 INFO mapreduce.Job: The url to track the job: http://resourcemanager:8088/proxy/application_1582090413480_0002/
20/02/19 00:46:32 INFO mapreduce.Job: Running job: job_1582090413480_0002
20/02/19 00:46:34 INFO mapreduce.Job: Job job_1582090413480_0002 running in uber mode : false
20/02/19 00:46:34 INFO mapreduce.Job: map 0% reduce 0%
20/02/19 00:46:34 INFO mapreduce.Job: Job job_1582090413480_0002 failed with state FAILED due to: Application application_1582090413480_0002 failed 2 times due to AM Container for appattempt_1582090413480_0002_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://resourcemanager:8088/proxy/application_1582090413480_0002/Then, click on links to logs of each attempt.
Diagnostics: Application application_1582090413480_0002 initialization failed (exitCode=255) with output: Requested user mcaf is not whitelisted and has id 779,which is below the minimum allowed 1000
Failing this attempt. Failing the application.
20/02/19 00:46:34 INFO mapreduce.Job: Counters: 0
Can you please check it and let me know please.
Regards,
Vinod
Created 02-18-2020 10:07 PM
Hello @venkatsambath ,
FYI...
Created 02-18-2020 10:12 PM
yes, you are in right direction. You can set min.user.id to a value lower value like 500 and then re-submit the job
Created 02-18-2020 11:08 PM
Thank you @venkatsambath
After modifying the min user id value to 500 i can able to run sample mapreduce job and i can see it in yarn applications in cloudera manager.
Now, I have tried with my regular job in same cluster, But it is failing and find below error messages,
ERROR 2020Feb19 02:01:21,086 main com.client.engineering.group.JOB.main.JOBMain: org.apache.hadoop.hbase.client.RetriesExhaustedException thrown: Can't get the location
org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:308) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:149) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:57) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:293) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:268) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:140) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:135) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:888) ~[JOB-0.0.31.jar:0.0.31]
at com.client.engineering.group.JOB.main.JOBMain.hasStagingData(JOBMain.java:304) [JOB-0.0.31.jar:0.0.31]
at com.client.engineering.group.JOB.main.JOBMain.main(JOBMain.java:375) [JOB-0.0.31.jar:0.0.31]
Caused by: java.io.IOException: Broken pipe
ERROR 2020Feb19 02:01:30,198 main com.client.engineering.group.job.main.jobMain: _v.1.0.0a_ org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException thrown: Failed 1 action: IOException: 1 time,
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: IOException: 1 time,
NOTE: I have executed the kinit mcaf before executing my job.
And do we need to execute 'kinit mcaf' every time before submitting the job ?
And how can we configure scheduled jobs ?
Please help me to understand.
Best Regards,
Vinod
Created 02-19-2020 01:01 AM
ERROR 2020Feb19 02:01:21,086 main com.client.engineering.group.JOB.main.JOBMain: org.apache.hadoop.hbase.client.RetriesExhaustedException thrown: Can't get the location
On this application which particular table are you trying to access? Did you validate if the user mcaf has permission to access the concerned table (https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_sg_hbase_authorization.html#top... has the commands) If there is no permission for the concerned user, grant them required privileges.
If you notice privileges required for mcaf are already provided. Then checking hbase master logs during the issue timeframe would give further clues.
Qn: And do we need to execute 'kinit mcaf' every time before submitting the job ? And how can we configure scheduled jobs ?
Ans: Yes and how are you scheduling the jobs? If its a shell script then you can include kinit command with mcaf's keytab which would avoid prompting for password
Created 03-05-2020 12:29 AM
Hi @venkatsambath,
As you said i have kept the kinit commands in first step in my scripts and when ever we execute the commands the kinit also run. But still i am facing same issue but this time i can see zookeeper as a user,
The commands i am using,
kinit -kt /home/mcaf/hdfs.keytab hdfs/hostname@Domain.ORG
kinit -kt /home/mcaf/hdfs.keytab HTTP/hostname@Domain.ORG
kinit -kt /home/mcaf/hbase.keytab hbase/hostname@Domain.ORG
kinit -kt /home/mcaf/yarn.keytab HTTP/hostname@Domain.ORG
kinit -kt /home/mcaf/yarn.keytab yarn/hostname@Domain.ORG
kinit -kt /home/mcaf/zookeeper.keytab zookeeper/hostname@Domain.org
Error Logs,
20/03/04 02:00:42 WARN security.UserGroupInformation: PriviledgedActionException as:zookeeper/hostname@Domain.ORG (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=zookeeper, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6533)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4337)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4307)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4280)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
org.apache.hadoop.security.AccessControlException: Permission denied: user=zookeeper, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x
org.apache.hadoop.security.AccessControlException: Permission denied: user=zookeeper, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6533)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4337)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4307)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4280)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
Can you please help me on this issue?
Best Regards,
Vinod