Support Questions
Find answers, ask questions, and share your expertise

Datanode Failures: DataXceiver error processing WRITE_BLOCK/READ_BLOCK operation

Datanode Failures: DataXceiver error processing WRITE_BLOCK/READ_BLOCK operation

Expert Contributor

I have been experiencing failures with my datanodes and the error is WRITE_BLOCK and READ_BLOCK. I have checked the data handlers and i have dfs.datanode.max.transfer.threads set to 16384. I run HDP 2.4.3 with 11 nodes. Please see error below;

2017-03-24 10:09:59,749 ERROR datanode.DataNode (DataXceiver.java:run(278)) - dn:50010:DataXceiver error processing READ_BLOCK operation  src: /ip_address:49591 dst: /ip_address:50010

2017-03-24 11:02:18,750 ERROR datanode.DataNode (DataXceiver.java:run(278)) - dn:50010:DataXceiver error processing WRITE_BLOCK operation  src: /ip_address:43052 dst: /ip_address:50010
10 REPLIES 10
Highlighted

Re: Datanode Failures: DataXceiver error processing WRITE_BLOCK/READ_BLOCK operation

Can you post the full stack trace. That might help to debug this. Thanks

Highlighted

Re: Datanode Failures: DataXceiver error processing WRITE_BLOCK/READ_BLOCK operation

Expert Contributor
@Namit Maheshwari Please find attached the log.
Highlighted

Re: Datanode Failures: DataXceiver error processing WRITE_BLOCK/READ_BLOCK operation

In the posted stack trace, there are lot of GC pauses.

Below is a good article explaining Namenode Garbage Collection practices:

https://community.hortonworks.com/articles/14170/namenode-garbage-collection-configuration-best-pra....

Highlighted

Re: Datanode Failures: DataXceiver error processing WRITE_BLOCK/READ_BLOCK operation

Expert Contributor

for writing issue. if you share more information also gets better understanding.

1) Check Data node is listing in Ambari WI

2) if data node is fine, it may be the jira as below.

https://issues.apache.org/jira/browse/HDFS-770

Highlighted

Re: Datanode Failures: DataXceiver error processing WRITE_BLOCK/READ_BLOCK operation

Expert Contributor

@zkfs Failed datanode is still listed in Ambari UI but there is a connection failed alert and running the hdfs dfsadmin -report command shows the dead datanodes.

Highlighted

Re: Datanode Failures: DataXceiver error processing WRITE_BLOCK/READ_BLOCK operation

Hi @Joshua Adeleke, how frequently do you see the errors. These are some times seen in busy clusters and usually clients/HDFS recover from transient failures.

If there are no job or task failures around the time you of the errors, I would just ignore them.

Edit: I took a look at your attached log file. There's a lot of GC pauses as @Namit Maheshwari pointed out.

Try increasing the DataNode heap size and PermGen/NewGen allocations until the GC pauses go away.

2017-03-25 10:10:18,219 WARN  util.JvmPauseMonitor (JvmPauseMonitor.java:run(192)) - Detected pause in JVM or host machine (eg GC): pause of approximately 44122ms
GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=44419ms
Highlighted

Re: Datanode Failures: DataXceiver error processing WRITE_BLOCK/READ_BLOCK operation

Expert Contributor

Hello @Arpit Agarwal, the errors are quite frequent and I just restarted 2 data nodes now. Infact, 4 out of 8 data nodes have been restarted in the last 6 hours.

Highlighted

Re: Datanode Failures: DataXceiver error processing WRITE_BLOCK/READ_BLOCK operation

Just curious - why did you restart the Data Nodes? Did they crash?

Highlighted

Re: Datanode Failures: DataXceiver error processing WRITE_BLOCK/READ_BLOCK operation

Expert Contributor

Yes, the datanodes crashed @Arpit Agarwal. hdfs dfsadmin -report command is more accurate in reporting the datanode failure.


ambari-dn-display.png
Don't have an account?