Support Questions
Find answers, ask questions, and share your expertise

Data node down

Hi

I see the below exception and this brings the data node down. From the errors , can anyone suggest what parameter of the hdfs configuration i should look at and try to tune. i guess it is because of lack of resources to write into hdfs.

2016-06-23 08:25:39,553 INFO datanode.DataNode (DataNode.java:transferBlock(1959)) - DatanodeRegistration(10.107.107.150:50010, datanodeUuid=c0f91520 -d7ca-4fa3-b618-0832721376ad, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-56;cid=CID-9561e6ec-bc63-4bb6-934c-e89019a53c39;nsid=198 4339524;c=0) Starting thread to transfer BP-1415030235-10.107.107.100-1452778704087:blk_1077927121_4186297 to 10.107.107.152:50010 2016-06-23 08:25:39,554 WARN datanode.DataNode (BPServiceActor.java:run(851)) - Unexpected exception in block pool Block pool BP-1415030235-10.107.10 7.100-1452778704087 (Datanode Uuid c0f91520-d7ca-4fa3-b618-0832721376ad) service to /10.107.107.100:8020 java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:714) at org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlock(DataNode.java:1962) at org.apache.hadoop.hdfs.server.datanode.DataNode.transferBlocks(DataNode.java:1971) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:657) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:615) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:877) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:684) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:843) at java.lang.Thread.run(Thread.java:745) 2016-06-23 08:25:39,554 WARN datanode.DataNode (BPServiceActor.java:run(854)) - Ending block pool service for: Block pool BP-1415030235-10.107.107.10 0-1452778704087 (Datanode Uuid c0f91520-d7ca-4fa3-b618-0832721376ad) service to 10.107.107.100:8020 2016-06-23 08:25:39,657 INFO datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool BP-1415030235-10.107.107.100-1452778704087 (D atanode Uuid c0f91520-d7ca-4fa3-b618-0832721376ad) 2016-06-23 08:25:39,658 INFO impl.FsDatasetImpl (FsDatasetImpl.java:shutdownBlockPool(2511)) - Removing block pool BP-1415030235-10.107.107.100-14527 78704087 2016-06-23 08:25:39,800 INFO datanode.DataNode (BlockReceiver.java:run(1405)) - PacketResponder: BP-1415030235-10.107.107.100-1452778704087:blk_10779 27235_4186411, type=LAST_IN_PIPELINE, downstreams=0:[] terminating 2016-06-23 08:25:40,337 INFO datanode.DataNode (BlockReceiver.java:run(1405)) - PacketResponder: BP-1415030235-10.107.107.100-1452778704087:blk_10779 27234_4186410, type=LAST_IN_PIPELINE, downstreams=0:[] terminating 2016-06-23 08:25:41,078 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(934)) - Exception for BP-1415030235-10.107.107.100-1452778704087:blk_ 1077927238_4186414 2016-06-23 08:25:41,089 INFO datanode.DataNode (BlockReceiver.java:run(1369)) - PacketResponder: BP-1415030235-10.107.107.100-1452778704087:blk_1077927237_4186413, type=HAS_DOWNSTREAM_IN_PIPELINE: Thread is interrupted. 2016-06-23 08:25:41,089 INFO datanode.DataNode (BlockReceiver.java:run(1405)) - PacketResponder: BP-1415030235-10.107.107.100-1452778704087:blk_1077927237_4186413, type=HAS_DOWNSTREAM_IN_PIPELINE terminating 2016-06-23 08:25:41,089 INFO datanode.DataNode (DataXceiver.java:writeBlock(840)) - opWriteBlock BP-1415030235-10.107.107.100-1452778704087:blk_1077927237_4186413 received exception java.io.IOException: Premature EOF from inputStream 2016-06-23 08:25:41,089 ERROR datanode.DataNode (DataXceiver.java:run(278)) - :50010:DataXceiver error processing WRITE_BLOCK operation src: /10.107.107.150:62004 dst: /10.107.107.150:50010 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:501) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:895) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:807) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251) at java.lang.Thread.run(Thread.java:745) 2016-06-23 08:25:41,671 WARN datanode.DataNode (DataNode.java:secureMain(2540)) - Exiting Datanode 2016-06-23 08:25:41,673 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0 2016-06-23 08:25:41,677 INFO datanode.DataNode (LogAdapter.java:info(45)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at 10.107.107.150 ************************************************************/

1 ACCEPTED SOLUTION

Accepted Solutions

Seems to be an issue of OOM with datanode, please increase the heapsize of datanode process and see if that resolve the issue.

(BPServiceActor.java:run(851)) - Unexpected exception in block pool Block pool BP-1415030235-10.107.10 7.100-1452778704087 (Datanode Uuid c0f91520-d7ca-4fa3-b618-0832721376ad) service to /10.107.107.100:8020 java.lang.OutOfMemoryError: unable to create 

Also check ulimit size is sufficient on datanode machines.

bash-4.1$ ulimit -a

View solution in original post

9 REPLIES 9

Seems to be an issue of OOM with datanode, please increase the heapsize of datanode process and see if that resolve the issue.

(BPServiceActor.java:run(851)) - Unexpected exception in block pool Block pool BP-1415030235-10.107.10 7.100-1452778704087 (Datanode Uuid c0f91520-d7ca-4fa3-b618-0832721376ad) service to /10.107.107.100:8020 java.lang.OutOfMemoryError: unable to create 

Also check ulimit size is sufficient on datanode machines.

bash-4.1$ ulimit -a

View solution in original post

@Jitendra Yadav, thanks for your response. what is the property and what is the recommended size. we have 256 gb of ram per machine

Since you have 256g RAM of machine then I would suggest you to keep databnode heap size between 6-8G

You can change the heapsize from Ambari UI i.e HDFS-> Config

see screenshot.

screen-shot-2016-06-23-at-31802-pm.png

@Jitendra Yadav,

This is the ulimiit output

core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 1029927 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 65536 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

Yes, Please increase the datanode heapsize to 6G and restart the datanode services on all the hosts.

Rising Star
@ARUNKUMAR RAMASAMY

Check the heapsize and ulimits(hdfs user)

@Jitendra Yadav, @Yogeshprabhu, the data node heap size is just 1 GB. it is the default one done during installation. May be i need to change that

Rising Star

@ARUNKUMAR RAMASAMY Yes change it. You might need to restart the HDFS and other services as ambari suggests.

Rising Star

Go with the recommendations above as @Jitendra Yadav has recommended.