0 down vote favorite
iam using hadoop apache 2.7.1 on centos 7
in HA cluster that consists of two namenodes and 6 data nodes and i realized the following error that always found in my log
DataXceiver error processing WRITE_BLOCK operation src: /172.16.1.153:38360 dst: /172.16.1.153:50010 java.io.IOException: Connection reset by peer
so i updated the following propetey in hdfs-site.xml in order to increase the number of available threads in all datanodes
<property> <name>dfs.datanode.max.transfer.threads</name> <value>16000</value> </property>
and i increased the number of open files too by editing baschrc
ulimit -n 16384
but i still getting this error in my data nodes logs
so while sending write requests to the cluster
i issued the following command on data nodes to know number of threads
and they never exceeds 100 thread
and in in order to know open files number issued sysctl fs.file-nr
and they never exceeds 300 open files
so why iam always getting this error in data node logs and what is it's effect on performance
The error looks not related to number of threads running in the datanode. It really looks related to connection problem.
It would be really helpful if you can provide more detaile stacktrace.
My GUESS is that there could be chances that, Next datanode in the pipeline (given by the namenode) is down. So first datanode is not able to connect to next datanode and throwing the above mentioned exception.
Since you have 6 datanodes, writes could be successful with remaining nodes in the cluster.