Created on 02-17-2020 11:08 PM - last edited on 02-17-2020 11:21 PM by VidyaSargur
Hello Team,
I have anabled MIT-Kerberos and integrated my cluster, Initialized the principals for hdfs, hbase and yarn.
Able to access the hdfs and hbase tables.
But when i am trying to run sample mapreduce job its getting failed, Find below error logs.
==> yarn jar /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/hadoop-examples.jar teragen 500000000 /tmp/teragen2
Logs:
WARN security.UserGroupInformation: PriviledgedActionException as:HTTP/hostname.org@FQDN.COM (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=HTTP, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=HTTP, access=WRITE, inode="/user":mcaf:supergroup:drwxr-xr-x
hostname.org:~:HADOOP QA]$ klist
Ticket cache: FILE:/tmp/krb5cc_251473
Default principal: HTTP/hostname.org@FQDN.COM
Valid starting Expires Service principal
02/18/20 01:55:32 02/19/20 01:55:32 krbtgt/FQDN.COM@FQDN.COM
renew until 02/23/20 01:55:32
Can some one please check the issue and help us.
Thanks & Regards,
Vinod
Created 03-05-2020 02:11 AM
The commands i am using,
kinit -kt /home/mcaf/hdfs.keytab hdfs/hostname@Domain.ORG
kinit -kt /home/mcaf/hdfs.keytab HTTP/hostname@Domain.ORG
kinit -kt /home/mcaf/hbase.keytab hbase/hostname@Domain.ORG
kinit -kt /home/mcaf/yarn.keytab HTTP/hostname@Domain.ORG
kinit -kt /home/mcaf/yarn.keytab yarn/hostname@Domain.ORG
kinit -kt /home/mcaf/zookeeper.keytab zookeeper/hostname@Domain.org
You have to kinit as the user by which you want to access the data. In the above commands I see you are trying to run kinit as hdfs, HTTP, hbase, yarn and zookeeper sequentially. When you run
kinit -kt /home/mcaf/hdfs.keytab hdfs/hostname@Domain.ORG
It will write a tgt in location set by KRB5CCNAME(default is /tmp/krb5cc_[uid]). When you run the next kinit with hbase, the tgt acquired by previous command gets overwritten. In your case you are running multiple kinit and the last kinit was for the zookeeper user and hence the tgt will be available for zookeeper and all user prior to it gets overwritten. So use 1 kinit command with a user id intended for that application
Created 03-23-2020 09:34 AM
@venkatsambath Sorry for late response.
Thank you for your valuable response and i got your point where i am doing mistake.
Here i want to create a keytab file for a user and that user can access all the services like, hdfs, hbase and other services running in the cluster.
I have tried with following steps please suggest me with your inputs.
sudo ktutil
ktutil: addent -password -p mcaf@Domain.ORG -k 1 -e RC4-HMAC
Password for mcaf@Domain.ORG:
ktutil: wkt mcaf.keytab
ktutil: q
klist -kt mcaf.keytab
Keytab name: FILE:mcaf.keytab
KVNO Timestamp Principal
---- ----------------- --------------------------------------------------------
1 03/23/20 11:58:38 mcaf@Domain.ORG
sudo kinit -kt mcaf.keytab mcaf@Domain.ORG
And able to access hdfs using,
hadoop fs -ls /
But coming to hbase, i am not able to see the tables.
hbase(main):001:0> list
TABLE
0 row(s) in 0.4090 seconds
=> []
When i copied the latest keytab from process directory for hbase-master,
dayrhemwkq001:~:HADOOP QA]$ kinit -kt hbase.keytab hbase/dayrhemwkq001.enterprisenet.org@MWKRBCDH.ORG
I can able to see the tables.
My question is, I want to give a access to the user and that user can access hbase, hdfs and other services running in the luster.
Please suggest me with your inputs.
Best Regards,
Vinod
Created 03-23-2020 12:14 PM
Your issue can be resolved by merging the keytabs in question.
Merge keytab files
If you have multiple keytab files that need to be in one place, you can merge the keys with the ktutil command.
Depending on whether you are using MIT or Heimdal Kerberos the process is different but to merge keytab files using MIT Kerberos, use:
In the below example I am merging [mcaf.keytab],[hbase.keytab] and [zk.keytab] into mcafmerged.keytab you can merge n number of keytabs but you must ensure the user executing has the correct permissions, it could be a good idea to copy the keytabs and merge them from the users' home directory
$ ktutil
ktutil: read_kt mcaf.keytab
ktutil: read_kt hbase.keytab
ktutil: read_kt zk.keytab
ktutil: write_kt
ktutil: quit
To verify the merge
Use
$ klist -k mcafmerged.keytab
Now to access hbase
$ sudo kinit -kt mcafmerged.keytab mcaf@Domain.ORG
The keytab file is independent of the computer it's created on, its filename, and its location in the file system. Once it's created, you can rename it, move it to another location on the same compute.
Created on 03-23-2020 11:27 PM - edited 03-24-2020 05:22 AM
Hello @Shelton @venkatsambath ,
As you mention above I have done merging keytab files like, mcaf.keytab, yarn.keytab and other service keytabs.
Created mcafmerged.keytab and executed using kinit -kt mcafmerged.keytab mcaf@Domain.ORG
After the above process i am able to access hdfs, hbase tables using hbase shell and able to see yarn applications -list.
But when i run below sample yarn job,
yarn jar /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/hadoop-examples.jar teragen 500000000 /tmp/teragen44
Getting below error's,
Can't create directory /disk1/yarn/nm/usercache/mcaf/appcache/application_1585026002165_0001 - Permission denied
Can't create directory /disk2/yarn/nm/usercache/mcaf/appcache/application_1585026002165_0001 - Permission denied
Can't create directory /disk3/yarn/nm/usercache/mcaf/appcache/application_1585026002165_0001 - Permission denied
Can't create directory /disk4/yarn/nm/usercache/mcaf/appcache/application_1585026002165_0001 - Permission denied
Can't create directory /disk5/yarn/nm/usercache/mcaf/appcache/application_1585026002165_0001 - Permission denied
Did not create any app directories.
And i gave a trail run of my application job and that is also failing with below errors,
org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:308) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:149) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:57) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:293) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:268) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:140) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:135) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:888) ~[DMXSLoader-0.0.31.jar:0.0.31]
at com.class.name.dmxsloader.main.DMXSLoaderMain.hasStagingData(DMXSLoaderMain.java:304) [DMXSLoader-0.0.31.jar:0.0.31]
at com.class.name.dmxsloader.main.DMXSLoaderMain.main(DMXSLoaderMain.java:375) [DMXSLoader-0.0.31.jar:0.0.31]
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[?:1.7.0_67]
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[?:1.7.0_67]
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[?:1.7.0_67]
at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[?:1.7.0_67]
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) ~[?:1.7.0_67]
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63) ~[hadoop-common-2.6.0-cdh5.4.7.jar:?]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) ~[hadoop-common-2.6.0-cdh5.4.7.jar:?]
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159) ~[hadoop-common-2.6.0-cdh5.4.7.jar:?]
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117) ~[hadoop-common-2.6.0-cdh5.4.7.jar:?]
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[?:1.7.0_67]
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[?:1.7.0_67]
at java.io.DataOutputStream.flush(DataOutputStream.java:123) ~[?:1.7.0_67]
at org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:246) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:234) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:895) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:850) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1184) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:31865) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1580) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1294) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1126) ~[DMXSLoader-0.0.31.jar:0.0.31]
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:299) ~[DMXSLoader-0.0.31.jar:0.0.31]
... 10 more
NOTE: I have kept the below command in first line of my application script before going to launch the job,
kinit -kt mcafmerged.keytab mcaf@MWKRBCDH.ORG
Please let me know where i am missing here ?
Thanks & Regards,
Vinod
Created 03-24-2020 03:10 AM
That's great that the initial issue was resolved with the keytab merge, but if I could ask why did you merge all the key tabs to mcafmerged.keytab it could have been proper to merge only the Hbase and your mcaf keytab , anyway that said your subsequent is a permission issue on the directory
/disk1/yarn/nm/usercache/mcaf.
Can you share the output of
$ ls /disk1/yarn/nm/usercache
and
$ ls /disk1/yarn/nm/usercache/mcaf
Can you try changing the permission with the correct group for user mcaf i.e as the root user
# chown -R mcaf:{group} /disk1/yarn/nm/usercache/mcaf
Then rerun the Terragen command that should work.
Keep me posted
Created 03-24-2020 05:30 AM
Hello @Shelton ,
Thanks for your immediate response.
Find below outputs,
HOSTNAME]$ ls /disk1/yarn/nm/usercache
mcaf
HOSTNAME]$ ls /disk1/yarn/nm/usercache/mcaf
appcache filecache
HOSTNAME]$ ls -lrt /disk1/yarn/nm/usercache/mcaf
total 20
drwx--x--- 397 yarn yarn 16384 Mar 4 01:18 filecache
drwx--x--- 2 yarn yarn 4096 Mar 4 02:22 appcache
HOSTNAME]$ ls -lrt /disk1/yarn/nm/usercache
total 4
drwxr-s--- 4 mcaf yarn 4096 Feb 24 01:26 mcaf
Q1, If we enable kerberos do we needs to modify permissions to the above directory?
And mcaf having sudo access.
Q2, We are using two edgenodes. Can i use the above merged.keytab in another edgenode ?
Or do i needs to generate them like what i did in current edgenode ?
Best Regards,
Vinod
Created 03-24-2020 07:33 AM
I can see the setuid bit (drwxr-s---) was set which alters the standard behavior so that the group of the files created inside said directory, will not be that of the user who created them, but that of the parent directory itself
$ ls -lrt /disk1/yarn/nm/usercache
total 4
drwxr-s--- 4 mcaf yarn 4096 Feb 24 01:26 mcaf
Can you remove the setuid bit as the root user
# chmod -s /disk1/yarn/nm/usercache/mcaf
Then rerun
Question1.
You don't need to explicitly change file permission when you enable Kerberos, it should work out of the box
Question2.
You don't need to regenerate a new mcafmerged.keytab just copy it to you other edge nodes it should work as that edge node is also part of the cluster
Please revert
Created 03-24-2020 09:31 AM
Hi @Shelton ,
I ran above command you shared for removing 's' permissions for the directory.
And then i triggered same yarn sample job and facing same issue,
ERROR:
20/03/24 12:29:24 INFO mapreduce.Job: Task Id : attempt_1585066027398_0003_m_000000_1, Status : FAILED
Application application_1585066027398_0003 initialization failed (exitCode=255) with output: main : command provided 0
main : user is mcaf
main : requested yarn user is mcaf
Can't create directory /disk1/yarn/nm/usercache/mcaf/appcache/application_1585066027398_0003 - Permission denied
Can't create directory /disk2/yarn/nm/usercache/mcaf/appcache/application_1585066027398_0003 - Permission denied
Can't create directory /disk3/yarn/nm/usercache/mcaf/appcache/application_1585066027398_0003 - Permission denied
Can't create directory /disk4/yarn/nm/usercache/mcaf/appcache/application_1585066027398_0003 - Permission denied
Can't create directory /disk5/yarn/nm/usercache/mcaf/appcache/application_1585066027398_0003 - Permission denied
Did not create any app directories
Find below directory structure.
HOSTNAME]$ sudo ls -lrt /disk2/yarn/nm/usercache/mcaf/appcache
total 0
HOSTNAME]$ sudo ls -ld /disk2/yarn/nm/usercache/mcaf/appcache
drwx--x--- 2 yarn yarn 4096 Mar 4 02:22 /disk2/yarn/nm/usercache/mcaf/appcache
HOSTNAME]$ sudo ls -lrt /disk2/yarn/nm/usercache/mcaf
total 24
drwx--x--- 493 yarn yarn 20480 Mar 4 01:18 filecache
drwx--x--- 2 yarn yarn 4096 Mar 4 02:22 appcache
HOSTNAME]$ sudo ls -ld /disk2/yarn/nm/usercache/mcaf
drwxr-x--- 4 yarn yarn 4096 Feb 24 01:26 /disk2/yarn/nm/usercache/mcaf
HOSTNAME]$ sudo ls -ld /disk2/yarn/nm/usercache/
drwxr-xr-x 3 yarn yarn 4096 Feb 24 01:26 /disk2/yarn/nm/usercache/
HOSTNAME]$ sudo ls -lrt /disk2/yarn/nm/usercache
total 4
drwxr-x--- 4 yarn yarn 4096 Feb 24 01:26 mcaf
NOTE: I have modified those permissions in all the servers.
Best Regards,
Vinod
Created 03-25-2020 09:14 PM
Created 03-25-2020 11:06 PM
These app cache directories gets auto generated upon job submission - So can you remove them from nodemanagers [so that it gets created fresh with required acls]
/disk{1,2,3,4,5}/yarn/nm/usercache/mcaf
and then re-submit the job again