Created 06-23-2016 02:01 PM
Hi
I see the below exception and this brings the data node down. From the errors , can anyone suggest what parameter of the hdfs configuration i should look at and try to tune. i guess it is because of lack of resources to write into hdfs.
2016-06-23 08:25:39,553 INFO datanode.DataNode (DataNode.java:transferBlock(1959)) - DatanodeRegistration(10.107.107.150:50010, datanodeUuid=c0f91520 -d7ca-4fa3-b618-0832721376ad, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-56;cid=CID-9561e6ec-bc63-4bb6-934c-e89019a53c39;nsid=198 4339524;c=0) Starting thread to transfer BP-1415030235-10.107.107.100-1452778704087:blk_1077927121_4186297 to 10.107.107.152:50010 2016-06-23 08:25:39,554 WARN datanode.DataNode (BPServiceActor.java:run(851)) - Unexpected exception in block pool Block pool BP-1415030235-10.107.10 7.100-1452778704087 (Datanode Uuid c0f91520-d7ca-4fa3-b618-0832721376ad) service to /10.107.107.100:8020 java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:714) at org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlock(DataNode.java:1962) at org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlocks(DataNode.java:1971) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:657) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:615) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:877) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:684) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:843) at java.lang.Thread.run(Thread.java:745) 2016-06-23 08:25:39,554 WARN datanode.DataNode (BPServiceActor.java:run(854)) - Ending block pool service for: Block pool BP-1415030235-10.107.107.10 0-1452778704087 (Datanode Uuid c0f91520-d7ca-4fa3-b618-0832721376ad) service to 10.107.107.100:8020 2016-06-23 08:25:39,657 INFO datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool BP-1415030235-10.107.107.100-1452778704087 (D atanode Uuid c0f91520-d7ca-4fa3-b618-0832721376ad) 2016-06-23 08:25:39,658 INFO impl.FsDatasetImpl (FsDatasetImpl.java:shutdownBlockPool(2511)) - Removing block pool BP-1415030235-10.107.107.100-14527 78704087 2016-06-23 08:25:39,800 INFO datanode.DataNode (BlockReceiver.java:run(1405)) - PacketResponder: BP-1415030235-10.107.107.100-1452778704087:blk_10779 27235_4186411, type=LAST_IN_PIPELINE, downstreams=0:[] terminating 2016-06-23 08:25:40,337 INFO datanode.DataNode (BlockReceiver.java:run(1405)) - PacketResponder: BP-1415030235-10.107.107.100-1452778704087:blk_10779 27234_4186410, type=LAST_IN_PIPELINE, downstreams=0:[] terminating 2016-06-23 08:25:41,078 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(934)) - Exception for BP-1415030235-10.107.107.100-1452778704087:blk_ 1077927238_4186414 2016-06-23 08:25:41,089 INFO datanode.DataNode (BlockReceiver.java:run(1369)) - PacketResponder: BP-1415030235-10.107.107.100-1452778704087:blk_1077927237_4186413, type=HAS_DOWNSTREAM_IN_PIPELINE: Thread is interrupted. 2016-06-23 08:25:41,089 INFO datanode.DataNode (BlockReceiver.java:run(1405)) - PacketResponder: BP-1415030235-10.107.107.100-1452778704087:blk_1077927237_4186413, type=HAS_DOWNSTREAM_IN_PIPELINE terminating 2016-06-23 08:25:41,089 INFO datanode.DataNode (DataXceiver.java:writeBlock(840)) - opWriteBlock BP-1415030235-10.107.107.100-1452778704087:blk_1077927237_4186413 received exception java.io.IOException: Premature EOF from inputStream 2016-06-23 08:25:41,089 ERROR datanode.DataNode (DataXceiver.java:run(278)) - :50010:DataXceiver error processing WRITE_BLOCK operation src: /10.107.107.150:62004 dst: /10.107.107.150:50010 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:501) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:895) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:807) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251) at java.lang.Thread.run(Thread.java:745) 2016-06-23 08:25:41,671 WARN datanode.DataNode (DataNode.java:secureMain(2540)) - Exiting Datanode 2016-06-23 08:25:41,673 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0 2016-06-23 08:25:41,677 INFO datanode.DataNode (LogAdapter.java:info(45)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at 10.107.107.150 ************************************************************/
Created 06-23-2016 02:09 PM
Seems to be an issue of OOM with datanode, please increase the heapsize of datanode process and see if that resolve the issue.
(BPServiceActor.java:run(851)) - Unexpected exception in block pool Block pool BP-1415030235-10.107.10 7.100-1452778704087 (Datanode Uuid c0f91520-d7ca-4fa3-b618-0832721376ad) service to /10.107.107.100:8020 java.lang.OutOfMemoryError: unable to create
Also check ulimit size is sufficient on datanode machines.
bash-4.1$ ulimit -a
Created 06-23-2016 02:09 PM
Seems to be an issue of OOM with datanode, please increase the heapsize of datanode process and see if that resolve the issue.
(BPServiceActor.java:run(851)) - Unexpected exception in block pool Block pool BP-1415030235-10.107.10 7.100-1452778704087 (Datanode Uuid c0f91520-d7ca-4fa3-b618-0832721376ad) service to /10.107.107.100:8020 java.lang.OutOfMemoryError: unable to create
Also check ulimit size is sufficient on datanode machines.
bash-4.1$ ulimit -a
Created 06-23-2016 02:11 PM
@Jitendra Yadav, thanks for your response. what is the property and what is the recommended size. we have 256 gb of ram per machine
Created 06-23-2016 02:18 PM
Since you have 256g RAM of machine then I would suggest you to keep databnode heap size between 6-8G
You can change the heapsize from Ambari UI i.e HDFS-> Config
see screenshot.
Created 06-23-2016 02:22 PM
This is the ulimiit output
core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 1029927 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 65536 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
Created 06-23-2016 02:30 PM
Yes, Please increase the datanode heapsize to 6G and restart the datanode services on all the hosts.
Created 06-23-2016 02:11 PM
Check the heapsize and ulimits(hdfs user)
Created 06-23-2016 02:26 PM
@Jitendra Yadav, @Yogeshprabhu, the data node heap size is just 1 GB. it is the default one done during installation. May be i need to change that
Created 06-23-2016 02:33 PM
@ARUNKUMAR RAMASAMY Yes change it. You might need to restart the HDFS and other services as ambari suggests.
Created 06-23-2016 02:42 PM
Go with the recommendations above as @Jitendra Yadav has recommended.