Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Yarn jobs are failing after enabling MIT-Kerberos

avatar
Rising Star

Hello Team,

 

I have anabled MIT-Kerberos and integrated my cluster, Initialized the principals for hdfs, hbase and yarn.

Able to access the hdfs and hbase tables.

But when i am trying to run sample mapreduce job its getting failed, Find below error logs.

 

==> yarn jar /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/hadoop-examples.jar teragen 500000000 /tmp/teragen2

Logs:

WARN security.UserGroupInformation: PriviledgedActionException as:HTTP/hostname.org@FQDN.COM (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=HTTP, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x

 

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=HTTP, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x

 

 

hostname.org:~:HADOOP QA]$ klist
Ticket cache: FILE:/tmp/krb5cc_251473
Default principal: HTTP/hostname.org@FQDN.COM

Valid starting Expires Service principal
02/18/20 01:55:32 02/19/20 01:55:32 krbtgt/FQDN.COM@FQDN.COM
renew until 02/23/20 01:55:32

 

 

Can some one please check the issue and help us.

 

Thanks & Regards,

Vinod

31 REPLIES 31

avatar
Master Collaborator

The klist result shows you are submitting job as  HTTP user

 

hostname.org:~:HADOOP QA]$ klist
Ticket cache: FILE:/tmp/krb5cc_251473
Default principal: HTTP/hostname.org@FQDN.COM

 

WARN security.UserGroupInformation: PriviledgedActionException as:HTTP/hostname.org@FQDN.COM (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=HTTP, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x

 

The above error just implies you don't have write permission for HTTP user on /user directory. So you can either provide write permission for "others" for /user in hdfs so that HTTP user can write or run the job after you kinit as user mcaf which has write permission

avatar
Rising Star

Hello @venkatsambath 

 

Thank you for your response...!!

Actually we use mcaf as a user to execute the jobs but why http user coming to the picture ?

 

hostname.com:~:HADOOP QA]$ groups
mcaf supergroup
hostname.com:~:HADOOP QA]$ users
mcaf

hostname.com:~:HADOOP QA]$ hadoop fs -ls /
Found 4 items
drwx------ - hbase supergroup 0 2020-02-18 02:46 /hbase
drwxr-xr-x - hdfs supergroup 0 2015-02-04 11:44 /system
drwxrwxrwt - hdfs supergroup 0 2020-02-17 05:07 /tmp
drwxr-xr-x - mcaf supergroup 0 2019-03-28 03:12 /user

 

hostname.com:~:HADOOP QA]$ getent group supergroup
supergroup:x:25290:hbase,mcaf,zookeeper,hdfs

 

hostname.com:~:HADOOP QA]$ getent group hadoop
hadoop:x:497:mapred,yarn,hdfs

 

 

Can you please have a look and suggest me what to do?
Note: I am trying to enable Kerberos and once it is running with out any interrupt or with out any issues, then we are planing to integrate with AD.

 

Thanks,

Vinod

avatar
Master Collaborator

Actually we use mcaf as a user to execute the jobs but why http user coming to the picture ?

--> By this do you mean, you switch to mcaf unix user[su - mcaf] and then run job? If yes, then its wrong. Post enabling kerberos hdfs and yarn recognises the user by the tgt and not by unix user id. So even if you su to mcaf and then have tgt as different user[say HTTP]. then yarn/hdfs recognises you by that tgt user.

 

Can you kinit mcaf, then run klist[to ensure you have mcaf tgt] and submit the job? 

avatar
Rising Star

Hi @venkatsambath 

 

First verified whether i am able to access hdfs before doing "kinit mcaf" and its failed to access.

Now i did kinit mcaf and verified hdfs access and able to list the files and able to create a directories.

Now i tried triggered sample yarn job,

 

hostname.com:~:HADOOP QA]$ yarn jar /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/hadoop-examples.jar teragen 500000000 /tmp/teragen4

20/02/19 00:46:30 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager/IP_ADDRESS:8032
20/02/19 00:46:30 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 8 for mcaf on ha-hdfs:nameservice1
20/02/19 00:46:30 INFO security.TokenCache: Got dt for hdfs://nameservice1; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (HDFS_DELEGATION_TOKEN token 8 for mcaf)
20/02/19 00:46:31 INFO terasort.TeraSort: Generating 500000000 using 2
20/02/19 00:46:31 INFO mapreduce.JobSubmitter: number of splits:2
20/02/19 00:46:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1582090413480_0002
20/02/19 00:46:31 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (HDFS_DELEGATION_TOKEN token 8 for mcaf)
20/02/19 00:46:32 INFO impl.YarnClientImpl: Submitted application application_1582090413480_0002
20/02/19 00:46:32 INFO mapreduce.Job: The url to track the job: http://resourcemanager:8088/proxy/application_1582090413480_0002/
20/02/19 00:46:32 INFO mapreduce.Job: Running job: job_1582090413480_0002
20/02/19 00:46:34 INFO mapreduce.Job: Job job_1582090413480_0002 running in uber mode : false
20/02/19 00:46:34 INFO mapreduce.Job: map 0% reduce 0%
20/02/19 00:46:34 INFO mapreduce.Job: Job job_1582090413480_0002 failed with state FAILED due to: Application application_1582090413480_0002 failed 2 times due to AM Container for appattempt_1582090413480_0002_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://resourcemanager:8088/proxy/application_1582090413480_0002/Then, click on links to logs of each attempt.
Diagnostics: Application application_1582090413480_0002 initialization failed (exitCode=255) with output: Requested user mcaf is not whitelisted and has id 779,which is below the minimum allowed 1000

Failing this attempt. Failing the application.
20/02/19 00:46:34 INFO mapreduce.Job: Counters: 0

 

Can you please check it and let me know please.

 

Regards,

Vinod

 

avatar
Rising Star

Hello @venkatsambath ,

 

FYI...

min.user.id is set to 1000 in my Yarn configurations.
allowed.system.users is set to impala,nobody,llama,hive in my Yarn configurations.
 
Thanks,
Vinod

avatar
Master Collaborator

yes, you are in right direction. You can set min.user.id to a value lower value like 500 and then re-submit the job

avatar
Rising Star

Thank you @venkatsambath 

After modifying the min user id value to 500 i can able to run sample mapreduce job and i can see it in yarn applications in cloudera manager.

 

Now, I have tried with my regular job in same cluster, But it is failing and find below error messages,

 

ERROR 2020Feb19 02:01:21,086 main com.client.engineering.group.JOB.main.JOBMain: org.apache.hadoop.hbase.client.RetriesExhaustedException thrown: Can't get the location
org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:308) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:149) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:57) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:293) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:268) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:140) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:135) ~[JOB-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:888) ~[JOB-0.0.31.jar:0.0.31]
at com.client.engineering.group.JOB.main.JOBMain.hasStagingData(JOBMain.java:304) [JOB-0.0.31.jar:0.0.31]
at com.client.engineering.group.JOB.main.JOBMain.main(JOBMain.java:375) [JOB-0.0.31.jar:0.0.31]
Caused by: java.io.IOException: Broken pipe

 

ERROR 2020Feb19 02:01:30,198 main com.client.engineering.group.job.main.jobMain: _v.1.0.0a_ org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException thrown: Failed 1 action: IOException: 1 time,
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: IOException: 1 time,

 

NOTE: I have executed the kinit mcaf before executing my job.

 

And do we need to execute 'kinit mcaf' every time before submitting the job ?

And how can we configure scheduled jobs ?

 

Please help me to understand.

 

Best Regards,

Vinod

avatar
Master Collaborator
ERROR 2020Feb19 02:01:21,086 main com.client.engineering.group.JOB.main.JOBMain: org.apache.hadoop.hbase.client.RetriesExhaustedException thrown: Can't get the location

On this application which particular table are you trying to access? Did you validate if the user mcaf has permission to access the concerned table (https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_sg_hbase_authorization.html#top... has the commands) If there is no permission for the concerned user, grant them required privileges.

 

If you notice privileges required for mcaf are already provided. Then checking hbase master logs during the issue timeframe would give further clues.

 

Qn: And do we need to execute 'kinit mcaf' every time before submitting the job ? And how can we configure scheduled jobs ?

Ans: Yes and how are you scheduling the jobs? If its a shell script then you can include kinit command with mcaf's keytab which would avoid prompting for password

 

 

 

avatar
Rising Star

Hi @venkatsambath,

 

As you said i have kept the kinit commands in first step in my scripts and when ever we execute the commands the kinit also run. But still i am facing same issue but this time i can see zookeeper as a user,

 

The commands i am using, 

kinit -kt /home/mcaf/hdfs.keytab hdfs/hostname@Domain.ORG
kinit -kt /home/mcaf/hdfs.keytab HTTP/hostname@Domain.ORG

kinit -kt /home/mcaf/hbase.keytab hbase/hostname@Domain.ORG

kinit -kt /home/mcaf/yarn.keytab HTTP/hostname@Domain.ORG
kinit -kt /home/mcaf/yarn.keytab yarn/hostname@Domain.ORG

kinit -kt /home/mcaf/zookeeper.keytab zookeeper/hostname@Domain.org

 

 

Error Logs,

 

20/03/04 02:00:42 WARN security.UserGroupInformation: PriviledgedActionException as:zookeeper/hostname@Domain.ORG (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=zookeeper, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6533)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4337)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4307)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4280)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)

org.apache.hadoop.security.AccessControlException: Permission denied: user=zookeeper, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x
org.apache.hadoop.security.AccessControlException: Permission denied: user=zookeeper, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6581)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6533)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4337)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4307)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4280)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)

 

 

Can you please help me on this issue?

 

Best Regards,

Vinod