Support Questions

Find answers, ask questions, and share your expertise

Yarn jobs are failing after enabling MIT-Kerberos

avatar
Rising Star

Hello Team,

 

I have anabled MIT-Kerberos and integrated my cluster, Initialized the principals for hdfs, hbase and yarn.

Able to access the hdfs and hbase tables.

But when i am trying to run sample mapreduce job its getting failed, Find below error logs.

 

==> yarn jar /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/hadoop-examples.jar teragen 500000000 /tmp/teragen2

Logs:

WARN security.UserGroupInformation: PriviledgedActionException as:HTTP/hostname.org@FQDN.COM (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=HTTP, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x

 

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=HTTP, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x

 

 

hostname.org:~:HADOOP QA]$ klist
Ticket cache: FILE:/tmp/krb5cc_251473
Default principal: HTTP/hostname.org@FQDN.COM

Valid starting Expires Service principal
02/18/20 01:55:32 02/19/20 01:55:32 krbtgt/FQDN.COM@FQDN.COM
renew until 02/23/20 01:55:32

 

 

Can some one please check the issue and help us.

 

Thanks & Regards,

Vinod

31 REPLIES 31

avatar
The commands i am using, 
kinit -kt /home/mcaf/hdfs.keytab hdfs/hostname@Domain.ORG
kinit -kt /home/mcaf/hdfs.keytab HTTP/hostname@Domain.ORG
kinit -kt /home/mcaf/hbase.keytab hbase/hostname@Domain.ORG
kinit -kt /home/mcaf/yarn.keytab HTTP/hostname@Domain.ORG
kinit -kt /home/mcaf/yarn.keytab yarn/hostname@Domain.ORG
kinit -kt /home/mcaf/zookeeper.keytab zookeeper/hostname@Domain.org

You have to kinit as the user by which you want to access the data. In the above commands I see you are trying to run kinit as hdfs, HTTP, hbase, yarn and zookeeper sequentially. When you run

kinit -kt /home/mcaf/hdfs.keytab hdfs/hostname@Domain.ORG

It will write a tgt in location set by KRB5CCNAME(default is /tmp/krb5cc_[uid]). When you run the next kinit with hbase, the tgt acquired by previous command gets overwritten. In your case you are running multiple kinit and the last kinit was for the zookeeper user and hence the tgt will be available for zookeeper and all user prior to it gets overwritten. So use 1 kinit command with a user id intended for that application

avatar
Rising Star

@venkatsambath Sorry for late response.

 

Thank you for your valuable response and i got your point where i am doing mistake.

Here i want to create a keytab file for a user and that user can access all the services like, hdfs, hbase and other services running in the cluster.

 

I have tried with following steps please suggest me with your inputs.

 

sudo ktutil

ktutil:  addent -password -p mcaf@Domain.ORG -k 1 -e RC4-HMAC

Password for mcaf@Domain.ORG:

ktutil:  wkt mcaf.keytab

ktutil:  q

 

 

klist -kt mcaf.keytab
Keytab name: FILE:mcaf.keytab
KVNO Timestamp Principal
---- ----------------- --------------------------------------------------------
1 03/23/20 11:58:38 mcaf@Domain.ORG

 

sudo kinit -kt mcaf.keytab mcaf@Domain.ORG

 

 

And able to access hdfs using,

hadoop fs -ls /

 

But coming to hbase, i am not able to see the tables.

hbase(main):001:0> list
TABLE
0 row(s) in 0.4090 seconds

=> []

 

 

When i copied the latest keytab from process directory for hbase-master,

 

dayrhemwkq001:~:HADOOP QA]$ kinit -kt hbase.keytab hbase/dayrhemwkq001.enterprisenet.org@MWKRBCDH.ORG

 

I can able to see the tables.

 

 

My question is, I want to give a access to the user and that user can access hbase, hdfs and other services running in the luster.

 

Please suggest me with your inputs.

 

Best Regards,

Vinod

avatar
Master Mentor

@kvinod 

Your issue can be resolved by merging the keytabs in question.

Merge keytab files
If you have multiple keytab files that need to be in one place, you can merge the keys with the ktutil command.

 

Depending on whether you are using MIT or Heimdal Kerberos the process is different but to merge keytab files using MIT Kerberos, use:

In the below example I am merging [mcaf.keytab],[hbase.keytab] and [zk.keytab] into mcafmerged.keytab you can merge n number of keytabs but you must ensure the user executing has the correct permissions, it could be a good idea to copy the keytabs and merge them from the users' home directory

$ ktutil
  ktutil: read_kt mcaf.keytab
  ktutil: read_kt hbase.keytab
  ktutil: read_kt zk.keytab
  ktutil: write_kt 
  ktutil: quit

To verify the merge

Use

$ klist -k mcafmerged.keytab

Now to access hbase

$ sudo kinit -kt mcafmerged.keytab mcaf@Domain.ORG

The keytab file is independent of the computer it's created on, its filename, and its location in the file system. Once it's created, you can rename it, move it to another location on the same compute.

avatar
Rising Star

Hello @Shelton @venkatsambath ,

 

As you mention above I have done merging keytab files like, mcaf.keytab, yarn.keytab and other service keytabs.

Created mcafmerged.keytab and executed using kinit -kt mcafmerged.keytab mcaf@Domain.ORG

 

After the above process i am able to access hdfs, hbase tables using hbase shell and able to see yarn applications -list.

 

But when i run below sample yarn job,

 

yarn jar /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/hadoop-examples.jar teragen 500000000 /tmp/teragen44

 

Getting below error's,

Can't create directory /disk1/yarn/nm/usercache/mcaf/appcache/application_1585026002165_0001 - Permission denied
Can't create directory /disk2/yarn/nm/usercache/mcaf/appcache/application_1585026002165_0001 - Permission denied
Can't create directory /disk3/yarn/nm/usercache/mcaf/appcache/application_1585026002165_0001 - Permission denied
Can't create directory /disk4/yarn/nm/usercache/mcaf/appcache/application_1585026002165_0001 - Permission denied
Can't create directory /disk5/yarn/nm/usercache/mcaf/appcache/application_1585026002165_0001 - Permission denied
Did not create any app directories.

 

And i gave a trail run of my application job and that is also failing with below errors,

 

org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:308) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:149) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:57) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:293) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:268) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:140) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:135) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:888) ~[DMXSLoader-0.0.31.jar:0.0.31]
at com.class.name.dmxsloader.main.DMXSLoaderMain.hasStagingData(DMXSLoaderMain.java:304) [DMXSLoader-0.0.31.jar:0.0.31]
at com.class.name.dmxsloader.main.DMXSLoaderMain.main(DMXSLoaderMain.java:375) [DMXSLoader-0.0.31.jar:0.0.31]
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:1.7.0_67]
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[?:1.7.0_67]
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[?:1.7.0_67]
at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[?:1.7.0_67]
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) ~[?:1.7.0_67]
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) ~[hadoop-common-2.6.0-cdh5.4.7.jar:?]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) ~[hadoop-common-2.6.0-cdh5.4.7.jar:?]
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) ~[hadoop-common-2.6.0-cdh5.4.7.jar:?]
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) ~[hadoop-common-2.6.0-cdh5.4.7.jar:?]

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[?:1.7.0_67]
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[?:1.7.0_67]
at java.io.DataOutputStream.flush(DataOutputStream.java:123) ~[?:1.7.0_67]
at org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:246) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:234) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:895) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:850) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1184) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:31865) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1580) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1294) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1126) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:299) ~[DMXSLoader-0.0.31.jar:0.0.31]
... 10 more

 

 

NOTE: I have kept the below command in first line of my application script before going to launch the job,

kinit -kt mcafmerged.keytab  mcaf@MWKRBCDH.ORG

 

 

Please let me know where i am missing here ?

 

Thanks & Regards,

Vinod

 

avatar
Master Mentor

@kvinod 

That's great that the initial issue was resolved with the keytab merge, but if I could ask why did you merge all the key tabs to mcafmerged.keytab  it could have been proper to merge only the Hbase and your mcaf keytab , anyway that said your subsequent is a permission issue on the directory  

/disk1/yarn/nm/usercache/mcaf. 

Can you share the output of 

$ ls  /disk1/yarn/nm/usercache

and  

$ ls /disk1/yarn/nm/usercache/mcaf

Can you try changing the permission  with the correct group for user mcaf i.e as the root user

# chown -R mcaf:{group}  /disk1/yarn/nm/usercache/mcaf

Then rerun the Terragen command  that should work.

 

Keep me posted

 

 

 

 

 

avatar
Rising Star

Hello @Shelton ,

 

Thanks for your immediate response.

Find below outputs,

 

HOSTNAME]$ ls /disk1/yarn/nm/usercache
mcaf
HOSTNAME]$ ls /disk1/yarn/nm/usercache/mcaf
appcache filecache
HOSTNAME]$ ls -lrt /disk1/yarn/nm/usercache/mcaf
total 20
drwx--x--- 397 yarn yarn 16384 Mar 4 01:18 filecache
drwx--x--- 2 yarn yarn 4096 Mar 4 02:22 appcache
HOSTNAME]$ ls -lrt /disk1/yarn/nm/usercache
total 4
drwxr-s--- 4 mcaf yarn 4096 Feb 24 01:26 mcaf

 

Q1, If we enable kerberos do we needs to modify permissions to the above directory?

And mcaf having sudo access.

 

Q2, We are using two edgenodes. Can i use the above merged.keytab in another edgenode ?

Or do i needs to generate them like what i did in current edgenode ?

 

Best Regards,

Vinod

 

avatar
Master Mentor

@kvinod 

 

I can see the setuid bit (drwxr-s---) was set which alters the standard behavior so that the group of the files created inside said directory, will not be that of the user who created them, but that of the parent directory itself

$ ls -lrt /disk1/yarn/nm/usercache
total 4
drwxr-s--- 4 mcaf yarn 4096 Feb 24 01:26 mcaf

Can you remove the setuid bit as the root user

 

# chmod -s /disk1/yarn/nm/usercache/mcaf

Then rerun

 

Question1.

You don't need to explicitly change file permission when you enable Kerberos, it should work out of the box

Question2.

You don't need to regenerate a new mcafmerged.keytab just copy it to you other edge nodes it should work as that edge node is also part of the cluster

 

Please revert

 

 

 

 

avatar
Rising Star

Hi @Shelton ,

 

I ran above command you shared for removing 's' permissions for the directory.

And then i triggered same yarn sample job and facing same issue,

 

ERROR:

20/03/24 12:29:24 INFO mapreduce.Job: Task Id : attempt_1585066027398_0003_m_000000_1, Status : FAILED
Application application_1585066027398_0003 initialization failed (exitCode=255) with output: main : command provided 0
main : user is mcaf
main : requested yarn user is mcaf
Can't create directory /disk1/yarn/nm/usercache/mcaf/appcache/application_1585066027398_0003 - Permission denied
Can't create directory /disk2/yarn/nm/usercache/mcaf/appcache/application_1585066027398_0003 - Permission denied
Can't create directory /disk3/yarn/nm/usercache/mcaf/appcache/application_1585066027398_0003 - Permission denied
Can't create directory /disk4/yarn/nm/usercache/mcaf/appcache/application_1585066027398_0003 - Permission denied
Can't create directory /disk5/yarn/nm/usercache/mcaf/appcache/application_1585066027398_0003 - Permission denied
Did not create any app directories

 

 

Find below directory structure.

 

HOSTNAME]$ sudo ls -lrt /disk2/yarn/nm/usercache/mcaf/appcache
total 0
HOSTNAME]$ sudo ls -ld /disk2/yarn/nm/usercache/mcaf/appcache
drwx--x--- 2 yarn yarn 4096 Mar 4 02:22 /disk2/yarn/nm/usercache/mcaf/appcache
HOSTNAME]$ sudo ls -lrt /disk2/yarn/nm/usercache/mcaf
total 24
drwx--x--- 493 yarn yarn 20480 Mar 4 01:18 filecache
drwx--x--- 2 yarn yarn 4096 Mar 4 02:22 appcache
HOSTNAME]$ sudo ls -ld /disk2/yarn/nm/usercache/mcaf
drwxr-x--- 4 yarn yarn 4096 Feb 24 01:26 /disk2/yarn/nm/usercache/mcaf
HOSTNAME]$ sudo ls -ld /disk2/yarn/nm/usercache/
drwxr-xr-x 3 yarn yarn 4096 Feb 24 01:26 /disk2/yarn/nm/usercache/
HOSTNAME]$ sudo ls -lrt /disk2/yarn/nm/usercache
total 4
drwxr-x--- 4 yarn yarn 4096 Feb 24 01:26 mcaf

 

NOTE: I have modified those permissions in all the servers.

 

Best Regards,

Vinod

avatar
Rising Star

Hi @Shelton @venkatsambath 

 

Can some one please help me to fix the issue ?

 

Best Regards,

Vinod

avatar

These app cache directories gets auto generated upon job submission - So can you remove them from nodemanagers [so that it gets created fresh with required acls] 

 

/disk{1,2,3,4,5}/yarn/nm/usercache/mcaf

 

and then re-submit the job again