Support Questions
Find answers, ask questions, and share your expertise

I have a problem in data node status it is always flapping "live <--> dead"

Explorer

Hi Experts,

I have the following symptoms and I can't pin the root cause out.

1- Datanode is always flapping between live and dead status.

2- NameNode is -I believe that consequent from 1st point- always flapping between being in safemode status and out of it.

3- inconsistency in the ambari readings such that one of the gauges illustrates that x number of datanode (is/are) dead although when I check the status of the datanode service on the node is up and running.

4- the namenode sometimes fitch no corrupted blocks and sometimes may fitch but I believe not the correct number.because when I use

hadoop -fsck / I got tremendous lines like as follows :

/IngestedData/Twitter/1059754915585828.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528828
/IngestedData/Twitter/1059758585906613.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515660
/IngestedData/Twitter/1059774195895653.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515666
/IngestedData/Twitter/1059779491501811.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1079102428
/IngestedData/Twitter/1059781413941185.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515667
/IngestedData/Twitter/1059784483341630.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528829
/IngestedData/Twitter/1059805700858061.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1079102444
/IngestedData/Twitter/1059808752032375.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528831
/IngestedData/Twitter/1059820909082031.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515672
/IngestedData/Twitter/1059839159043687.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515671
/IngestedData/Twitter/1059850922921177.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528834
/IngestedData/Twitter/1059879302310190.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528833
/IngestedData/Twitter/1059890677914985.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515674
/IngestedData/Twitter/1059892998401161.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1079102500
/IngestedData/Twitter/1059896117885101.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515677
/IngestedData/Twitter/1059897143123345.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515679
/IngestedData/Twitter/1059898881036386.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528836
/IngestedData/Twitter/10598b7b-f59b-4555-8abd-bd1dab564016: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075454299
/IngestedData/Twitter/1059908594543601.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515680
/IngestedData/Twitter/1059927449709253.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1079102554
/IngestedData/Twitter/1059933212287937.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528835
/IngestedData/Twitter/1059960457592976.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528832
/IngestedData/Twitter/1059966377443792.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515686
/IngestedData/Twitter/1059995695617270.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528838
/IngestedData/Twitter/1059996598183700.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515688
/IngestedData/Twitter/1059997598475029.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515692
/IngestedData/Twitter/1059a55f-6524-4e50-9fe7-aaf51f65228d: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074407507
/IngestedData/Twitter/1059b560-4f1a-447f-ae9e-678d8ec1f703: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074964486
/IngestedData/Twitter/1059c72e-9e3f-4443-98d5-1e6adf8c52e4: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075160859
/IngestedData/Twitter/1059ddcf-1f2a-4a3a-aebd-890f1e7913a7: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078522616
/IngestedData/Twitter/1059e3e1-9625-4d65-b23a-5492331c0071: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1077122861
/IngestedData/Twitter/105a157b-8ccc-476e-b22a-2e57dbb5f65b: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1077085406
/IngestedData/Twitter/105a366a-091b-413a-a4a0-e557d42a13ad: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1076102477
/IngestedData/Twitter/105a5d2e-1bd6-479c-9180-566db0969704: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1076820612
/IngestedData/Twitter/105a64e8-ba92-4e27-9964-7eef208e1691: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1076368081
/IngestedData/Twitter/105a68bc-ac38-4173-a359-9c2561276595: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075099264
/IngestedData/Twitter/105a6a48-d537-4f6c-988d-973b1008dafe: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1076152856
/IngestedData/Twitter/105a74ec-3c4b-4cd9-afc0-9c41083add0a: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1076926793
/IngestedData/Twitter/105abecf-8a82-4487-8551-e5997abbd875: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074256803
/IngestedData/Twitter/105ac257-6d77-41b4-8f92-abfb3343965b: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075162810
/IngestedData/Twitter/105ad037-126c-4291-acc1-3cd8b279563f: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074333885
/IngestedData/Twitter/105ae089-39dc-495f-9dfa-f25cbf08724b: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074150937
/IngestedData/Twitter/105b15cb-809c-42c0-81a7-243089531313: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074462469
/IngestedData/Twitter/105b1ef6-5cbe-4f64-a474-0511ffe0c91a: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1076979847
/IngestedData/Twitter/105b77c4-b7df-4cf0-9af6-3759e1c8c6df: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075160268
/IngestedData/Twitter/105bcdbb-aef2-4de7-bffe-3c20bddb0b2b: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075193566
/IngestedData/Twitter/105be7b6-0378-4875-9f7c-1b2bd5283a0a: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075994394
/IngestedData/Twitter/105be8c6-d987-4fde-8d52-eefdbcaec6b0: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075880255
/IngestedData/Twitter/105bee56-f8b5-43d6-b6a3-f43978bbb41b: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074479933
/IngestedData/Twitter/105bf24d-e2c5-4ef7-b07a-7183cbdae465: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1077247200
/IngestedData/Twitter/105bf47e-861a-4cb7-9004-5dc8aa50636c: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1076781808
/IngestedData/Twitter/105c4037-1dd1-4ad1-a050-146004640da0: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1073992314
/IngestedData/Twitter/105c5fbd-dc88-4eac-ac2c-c05f8f887720: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075906961
/IngestedData/Twitter/105c60d4-b72e-4b42-b4c3-db95d99c4acb: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1076062497
/IngestedData/Twitter/105c9cdc-86d0-43b9-b2e9-6a0f410033cc: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1076099595
/IngestedData/Twitter/105cbe7a-8ff5-4d3d-a843-130e594c9dde: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1076785563
/IngestedData/Twitter/105cc235-eabb-4af0-87a3-0dca6e9652e8: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1076780810
/IngestedData/Twitter/105cc45b-0024-4c61-91a1-d393a6c5fdeb: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075745100
/IngestedData/Twitter/105cca1c-be9d-4102-9c5f-86b3d285284f: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074283876
/IngestedData/Twitter/105cd1e3-20d2-4b2f-9736-a695589c92f4: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075694890
/IngestedData/Twitter/105cfe17-88f0-44df-8ccb-1d0b582efb97: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075797197
/IngestedData/Twitter/105d25f6-a256-4df9-9039-34e800361017: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1076429149
/IngestedData/Twitter/105d4922-a659-41c8-b43a-f74c9414b08a: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074114968
/IngestedData/Twitter/105d4be3-3520-4092-aae4-cbeddc7fe7ce: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074912637
/IngestedData/Twitter/105d8fda-80c7-4183-923d-4648ac2088b4: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075035916
/IngestedData/Twitter/105d9125-b581-4563-9791-25e0ca0dcc73: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074241287
/IngestedData/Twitter/105dbcfd-e282-45de-a2f5-6bf30e547573: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074275127
/IngestedData/Twitter/105dcd87-516c-4b27-b40c-be96d27e6b25: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1076830058
/IngestedData/Twitter/105ddb32-449a-404b-8a5b-f21f0c2367a9: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075460187
/IngestedData/Twitter/105de3fd-4cc3-46a9-9b01-6a7f24f923da: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074560807
/IngestedData/Twitter/105e18b3-3ce9-416a-80c6-f7f6eb08eb63: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074186202
/IngestedData/Twitter/105e8915-0dc9-47a4-8974-765e58ce8110: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074956035
/IngestedData/Twitter/105eb1a4-c9f1-479c-a16c-2d40311d6628: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075525558
/IngestedData/Twitter/105ed063-0638-403a-b289-8a9256bae645: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075118223
/IngestedData/Twitter/105f12a6-0a74-44ff-b980-8fd3b533af94: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074916454
/IngestedData/Twitter/105f415a-7826-41fc-94be-82698f57fba2: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074565217
/IngestedData/Twitter/105f419c-badc-4e80-a6bb-7e4b2ef3b5e6: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1076582046
/IngestedData/Twitter/105f531d-4ef0-476d-8bb5-cbcb48b6d12e: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074551783
/IngestedData/Twitter/105f58fb-098f-47b9-ae21-c50fcc513c8e: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075741758
/IngestedData/Twitter/105f6909-0aac-4168-bf18-660c00cbe3a1: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074485613
/IngestedData/Twitter/105f8884-1d4f-4bb1-a396-dd12050b089e: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1075875256
/IngestedData/Twitter/105feef4-bee7-4664-9c65-ee71409a5366: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1077194235
/IngestedData/Twitter/1060008631294209.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515691
/IngestedData/Twitter/1060016964676972.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515695
/IngestedData/Twitter/1060018157584619.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1079102588
/IngestedData/Twitter/1060026634560925.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528840
/IngestedData/Twitter/1060044229368338.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515698
/IngestedData/Twitter/1060052334177564.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515699
/IngestedData/Twitter/1060057482442954.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528837
/IngestedData/Twitter/1060065781777604.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515704
/IngestedData/Twitter/1060076302831162.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1079102634
/IngestedData/Twitter/1060077900904663.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515700
/IngestedData/Twitter/1060079574221834.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528841
/IngestedData/Twitter/1060098043879957.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515708
/IngestedData/Twitter/1060107186767007.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1079102611
/IngestedData/Twitter/1060111225066699.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528843
/IngestedData/Twitter/1060117531851953.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515714
/IngestedData/Twitter/1060118553509265.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515709
/IngestedData/Twitter/1060125933153193.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515713
/IngestedData/Twitter/1060136947663525.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515710
/IngestedData/Twitter/1060140694299629.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1079102644
/IngestedData/Twitter/1060140950581234.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515716
/IngestedData/Twitter/1060146058284806.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528842
/IngestedData/Twitter/1060147016572668.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515717
/IngestedData/Twitter/1060157326209609.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515718
/IngestedData/Twitter/1060165924032053.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528839
/IngestedData/Twitter/1060169454822207.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515719
/IngestedData/Twitter/1060180540358023.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515721
/IngestedData/Twitter/1060186551784435.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515723
/IngestedData/Twitter/1060188552544100.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515720
/IngestedData/Twitter/106018e6-0595-4bc2-a233-f293bc81d43a: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1074526580
/IngestedData/Twitter/1060201573749970.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515726
/IngestedData/Twitter/1060202628871437.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515725
/IngestedData/Twitter/1060206989880022.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1080528844
/IngestedData/Twitter/1060214708009548.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1078515731
/IngestedData/Twitter/1060231377022537.json: CORRUPT blockpool BP-1252659232-192.168.1.64-1523785142923 block blk_1079102692


anyone have any clue what should I do

72809-screenshot-from-2018-05-14-15-40-13.png

2 REPLIES 2

Did you check datanode logs which is switching between live and dead?

Explorer

yes @Nikhil Silsarma I believe I can see nothing than EOFExecption.

and below is a sample of a repeated pattern

2018-05-15 13:21:09,414 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1490)) - src: /192.168.1.65:45618, dest: /192.168.1.66:50010
, bytes: 295, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_1052889643_113, offset: 0, srvID: a27109bc-7ef8-472d-8a16-cd4eaf9baf68, blockid: BP-1
252659232-192.168.1.64-1523785142923:blk_1081336515_7597054, duration: 41856095
2018-05-15 13:21:09,414 INFO  datanode.DataNode (BlockReceiver.java:run(1463)) - PacketResponder: BP-1252659232-192.168.1.64-1523785142923:blk_10
81336515_7597054, type=LAST_IN_PIPELINE terminating
2018-05-15 13:21:09,424 INFO  datanode.DataNode (BPServiceActor.java:blockReport(395)) - Unsuccessfully sent block report 0x2477e5b94b6fd918,  co
ntaining 1 storage report(s), of which we sent 0. The reports had 6169093 total blocks and used 0 RPC(s). This took 330 msec to generate and 646 
msecs for RPC and NN processing. Got back no commands.
2018-05-15 13:21:09,424 WARN  datanode.DataNode (BPServiceActor.java:offerService(681)) - IOException in offerService
java.io.EOFException: End of File Exception between local host is: "fp-dev-03.fpconsultancy.com/192.168.1.66"; destination host is: "fp-dev-01.fp
consultancy.com":8020; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException
	at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1558)
	at org.apache.hadoop.ipc.Client.call(Client.java:1498)
	at org.apache.hadoop.ipc.Client.call(Client.java:1398)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
	at com.sun.proxy.$Proxy15.blockReport(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:216)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:377)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:653)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:793)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(DataInputStream.java:392)
	at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1119)
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1014)
2018-05-15 13:21:09,455 INFO  DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1490)) - src: /192.168.1.65:45620, dest: /192.168.1.66:50010
, bytes: 2957, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-403725899_122, offset: 0, srvID: a27109bc-7ef8-472d-8a16-cd4eaf9baf68, blockid: BP-
1252659232-192.168.1.64-1523785142923:blk_1081336518_7597057, duration: 82257567




; ;