My servers are down and not any responding when I run TestDFSIO the size over 300GB.No any messanges is recorded in ambari's log ,component's log(mapreduce , yarn ) or linux system . I used redhat7.0 ,12 SAS disks (7200) ,memory size is 96GB , 10G network , 6 servers (two namenodes, 4 datanodes) ,CPU:E5-2660v3(2.6GHz/10c)9.6GT/25ML3 * 2 .Yarn's settings are OK.
I use this command
hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.jar -write -nrFiles
300 -size 1024
when job runs about 40~80% , one of 4 datanodes is down and not any responding , then othter three datanodes are down turn by turn in 10 minutes . It 's so strange. I guess it's maybe the problem of 10G network cards , but no change when I replace network card driver and network card .
@Mon key, are you saying that the DataNodes are running when you start TestDFSIO, and then they appear to gradually shut down one by one while the job is running? If so, have you tried looking at the DataNode logs to try to determine why they shut down?