Member since
08-13-2015
5
Posts
0
Kudos Received
0
Solutions
04-09-2018
06:24 PM
We had a strange error in our HBase cluster which are not able to debug yet. This might have caused lot of latency spikes in our system, We see following logs in our region server logs after which it's IPC.QueueSize increased: 18:51:12,756 WARN [DataStreamer for file /apps/hbase/data/WALs/hbase-dn-131,16020,1517942847659/hbase-dn-131%2C16020%2C1517942847659.default.1523243925996 block BP-1872413417-101.331.253.88-1458393583173:blk_1136880676_63140431] hdfs.DFSClient: Error Recovery for block BP-1872413417-101.331.253.88-1458393583173:blk_1136880676_63140431 in pipeline DatanodeInfoWithStorage[101.341.1.246:50010,DS-d0254124-f206-4315-b337-8867eeb53375,DISK], DatanodeInfoWithStorage[101.321.11.107:50010,DS-4af2f0fd-69d2-4d7b-bb33-beb380c8fdcc,DISK], DatanodeInfoWithStorage[101.321.73.234:50010,DS-42df372f-dc80-46f1-b3eb-71794a509749,DISK]: bad datanode DatanodeInfoWithStorage[101.321.73.234:50010,DS-42df372f-dc80-46f1-b3eb-71794a509749,DISK]
18:51:12,327 INFO [DataStreamer for file /apps/hbase/data/WALs/hbase-dn-131,16020,1517942847659/hbase-dn-131%2C16020%2C1517942847659.default.1523243925996 block BP-1872413417-101.331.253.88-1458393583173:blk_1136880676_63140431] hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Got error, status message , ack with firstBadLink as 101.331.85.8:50010
at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1369)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1193)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:909)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:412) and at the same time we see following error in the other HDFS logs of DN, where it is trying to create this replica: 18:51:13,492 INFO impl.FsDatasetImpl (FsDatasetImpl.java:recoverRbw(1322)) - Recover RBW replica BP-1872413417-101.331.253.88-1458393583173:blk_1136880676_63140431
18:51:13,492 INFO datanode.DataNode (DataXceiver.java:writeBlock(837)) - opWriteBlock BP-1872413417-101.331.253.88-1458393583173:blk_1136880676_63140431 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Cannot append to a non-existent replica BP-1872413417-101.331.253.88-1458393583173:1136880676
18:51:13,492 ERROR datanode.DataNode (DataXceiver.java:run(278)) - hbase-dn-370:50010:DataXceiver error processing WRITE_BLOCK operation src: /101.321.11.107:45232 dst: /101.321.202.112:50010
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Cannot append to a non-existent replica BP-1872413417-101.331.253.88-1458393583173:1136880676
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getReplicaInfo(FsDatasetImpl.java:766)
at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverRbw(FsDatasetImpl.java:1324)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:195)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:677)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
at java.lang.Thread.run(Thread.java:745) This replica creation failed in all the DNs. We are not able to find why exactly this happened and what can be done about it. On looking in the code of Hadoop, I see this is being raised from following code: /**
* Get the meta info of a block stored in volumeMap. Block is looked up
* without matching the generation stamp.
* @param bpid block pool Id
* @param blkid block Id
* @return the meta replica information; null if block was not found
* @throws ReplicaNotFoundException if no entry is in the map or
* there is a generation stamp mismatch
*/
private ReplicaInfo getReplicaInfo(String bpid, long blkid)
throws ReplicaNotFoundException {
ReplicaInfo info = volumeMap.get(bpid, blkid);
if (info == null) {
throw new ReplicaNotFoundException(
ReplicaNotFoundException.NON_EXISTENT_REPLICA + bpid + ":" + blkid);
}
return info;
}
which means hdfs was not able to find the block in the volume map, I dont understand who updates this volume map, why it was not updated in this case, or has this gone in wrong flow?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
04-08-2018
06:08 PM
I have a use-case where I have enabled NFS gateway for my hadoop system following this nice guide. I have mounted it on another machine via: sudo mount -v -t nfs -o vers=3,proto=tcp,nolock,noacl $ip:/dataDir /mountDir
Now there is a use-case where I need to run chown command to a file in dataDir folder, so I run following: chown user2 /mountDir/sample.txt
But this gives error:
chown: changing ownership of `/mountDir/sample.txt': Permission denied and I get following in NFS gateway logs: 18/04/05 23:54:25 WARN nfs3.RpcProgramNfs3: Exception
org.apache.hadoop.security.AccessControlException: Non-super user cannot change owner
at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:83)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1669)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:703)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:464)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
I also trying added following in /etc/nfs.map file as mentioned in docs and an error faced doing this detailed here: uid 0 594903 //where 0 is uid of root on another machine, and 594903 is uid of hdfs which is superuser on datanode machine where NFS gateway is running.
But I still get this error: Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied. user=root is not the owner of inode=sample3.txt
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:250)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:227)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1755)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1724)
at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:80)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1669)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:703)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:464)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
Any idea how to get this done?
... View more
Labels:
- Labels:
-
Apache Hadoop