Created 04-11-2017 12:39 PM
I found that after a period of time running there will be a lot of tcp blocking package phenomenon
One of the machines
yarn nodemanager process is 34675
jps 34675 NodeManager netstat -anp | grep 34675 | grep 50010 tcp 100799 0 ::ffff:xxx.xx.xx.153:57938 ::ffff:xxx.xx.xx.29:50010 ESTABLISHED 34675/java tcp 76376 0 ::ffff:xxx.xx.xx.153:50020 ::ffff:xxx.xx.xx.206:50010 ESTABLISHED 34675/java tcp 0 0 ::ffff:xxx.xx.xx.153:36182 ::ffff:xxx.xx.xx.161:50010 ESTABLISHED 34675/java tcp 70584 0 ::ffff:xxx.xx.xx.153:33285 ::ffff:xxx.xx.xx.202:50010 ESTABLISHED 34675/java tcp 1301872 0 ::ffff:xxx.xx.xx.153:50534 ::ffff:xxx.xx.xx.22:50010 ESTABLISHED 34675/java tcp 73736 0 ::ffff:xxx.xx.xx.153:45629 ::ffff:xxx.xx.xx.130:50010 ESTABLISHED 34675/java tcp 145406 0 ::ffff:xxx.xx.xx.153:56123 ::ffff:xxx.xx.xx.57:50010 ESTABLISHED 34675/java tcp 165896 0 ::ffff:xxx.xx.xx.153:54038 ::ffff:xxx.xx.xx.36:50010 ESTABLISHED 34675/java tcp 154952 0 ::ffff:xxx.xx.xx.153:55024 ::ffff:xxx.xx.xx.25:50010 ESTABLISHED 34675/java tcp 1 0 ::ffff:xxx.xx.xx.153:39984 ::ffff:xxx.xx.xx.24:50010 CLOSE_WAIT 34675/java tcp 1 0 ::ffff:xxx.xx.xx.153:42582 ::ffff:xxx.xx.xx.35:50010 CLOSE_WAIT 34675/java tcp 93752 0 ::ffff:xxx.xx.xx.153:54546 ::ffff:xxx.xx.xx.125:50010 ESTABLISHED 34675/java tcp 88472 0 ::ffff:xxx.xx.xx.153:53022 ::ffff:xxx.xx.xx.34:50010 ESTABLISHED 34675/java tcp 72416 0 ::ffff:xxx.xx.xx.153:54486 ::ffff:xxx.xx.xx.123:50010 ESTABLISHED 34675/java tcp 197752 0 ::ffff:xxx.xx.xx.153:51549 ::ffff:xxx.xx.xx.204:50010 ESTABLISHED 34675/java tcp 1 0 ::ffff:xxx.xx.xx.153:60444 ::ffff:xxx.xx.xx.49:50010 CLOSE_WAIT 34675/java tcp 1 0 ::ffff:xxx.xx.xx.153:50642 ::ffff:xxx.xx.xx.44:50010 CLOSE_WAIT 34675/java tcp 1 0 ::ffff:xxx.xx.xx.153:49902 ::ffff:xxx.xx.xx.37:50010 CLOSE_WAIT 34675/java tcp 71776 0 ::ffff:xxx.xx.xx.153:35512 ::ffff:xxx.xx.xx.29:50010 ESTABLISHED 34675/java
You can see that there are a lot of problems, Many have been port connection has died
There are a lot of reports in the yarn nodemanager log
2017-04-11 19:30:02,040 INFO mapred.ShuffleHandler (ShuffleHandler.java:setResponseHeaders(1047)) - Setting connection close header... 2017-04-11 19:30:02,042 INFO mapred.ShuffleHandler (ShuffleHandler.java:setResponseHeaders(1047)) - Setting connection close header... 2017-04-11 19:30:02,046 INFO mapred.ShuffleHandler (ShuffleHandler.java:setResponseHeaders(1047)) - Setting connection close header... 2017-04-11 19:30:02,047 INFO mapred.ShuffleHandler (ShuffleHandler.java:setResponseHeaders(1047)) - Setting connection close header... 2017-04-11 19:30:02,059 INFO mapred.ShuffleHandler (ShuffleHandler.java:setResponseHeaders(1047)) - Setting connection close header... 2017-04-11 19:30:02,059 ERROR mapred.ShuffleHandler (ShuffleHandler.java:exceptionCaught(1200)) - Shuffle error: java.io.ioexception broken pipe at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:433) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:565) at org.jboss.netty.channel.DefaultFileRegion.transferTo(DefaultFileRegion.java:68) at org.apache.hadoop.mapred.FadvisedFileRegion.transferTo(FadvisedFileRegion.java:81) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool$FileSendBuffer.transferTo(SocketSendBufferPool.java:331) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:198) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromSelectorLoop(AbstractNioWorker.java:157) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:113) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2017-04-11 19:30:02,063 INFO mapred.ShuffleHandler (ShuffleHandler.java:setResponseHeaders(1047)) - Setting connection close header... 2017-04-11 19:30:02,064 INFO mapred.ShuffleHandler (ShuffleHandler.java:setResponseHeaders(1047)) - Setting connection close header... 2017-04-11 19:30:02,065 INFO mapred.ShuffleHandler (ShuffleHandler.java:setResponseHeaders(1047)) - Setting connection close header... 2017-04-11 19:30:02,066 INFO mapred.ShuffleHandler (ShuffleHandler.java:setResponseHeaders(1047)) - Setting connection close header... 2017-04-11 19:30:02,070 INFO mapred.ShuffleHandler (ShuffleHandler.java:setResponseHeaders(1047)) - Setting connection close header... 2017-04-11 19:30:02,071 INFO mapred.ShuffleHandler (ShuffleHandler.java:setResponseHeaders(1047)) - Setting connection close header... 2017-04-11 19:30:02,074 INFO mapred.ShuffleHandler (ShuffleHandler.java:setResponseHeaders(1047)) - Setting connection close header... 2017-04-11 19:30:02,074 INFO mapred.ShuffleHandler (ShuffleHandler.java:setResponseHeaders(1047)) - Setting connection close header...
I tried a lot of changes, or the same and The cluster will get slower and slower
Made changes
1、/proc/sys/net/core/somaxconn 204800
2、increase dfs.datanode.max.transfer.threads 16384
3、increase nodemanager heapsize and rourcemanager heapsize 2G
Please help me what you need to provide
Created 04-13-2017 07:08 AM
Who can give me advice ? now i tcpkill port