I installed Hadoop cluster with Ambari(HDP2.3). it contains 1 Client node, 2 master node and 4 slaves. First i have 3 slave node with no issue. i added one more slvave and issue started. How can i solve 3/4 DataNodes Live? Please see the attached screenshot.
and Datanode Information keep chaging.(data03 and data04)
hdfs datanode log(user: hdfs)
hdfs datanode log(user: root)
We basically see two different errors here in DataNode logs:
1. Following errors indicates that the "dfs.datanode.data.dir" directory (/hadoop/hdfs/data) is not valid ... May be it has some mount issue of the content of the mentioned directory are not readable/corrupt/incorrect permission due to some file system issue.
all directories in dfs.datanode.data.dir are invalid. directory not readable: /hadoop/hdfs/data
2. Port Already is used 50010
BindException: Address already in use BindException: Problem binding to [0.0.0.0:50010]
So please check if the directory is valid and readable by the user who is running the DataNode ?
Also before starting the DataNode please try to find and kill a process which is already using port 50010 then freshly try to start the DataNode.
# netstat -tnlpa | grep 50010
From the sent screenshots both errors regarding port binding and missing directory seem to originate from the Edge Server (where there should not be any DataNode from your first screenshots).
Can you please double-check that you are not trying to run a DataNode on your Edge Node? The logs may not be relevant then.
From your other screenshot there might be a problem in the interconnection of DataNodes and NameNode. Can you please check the NameNode log?