Member since
04-22-2019
2
Posts
0
Kudos Received
0
Solutions
05-03-2019
11:29 AM
no cm installed. cdh: 2.6.0-cdh5.15.1 Failure: 2019-05-03 13:18:50,472 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:etl (auth:SIMPLE) cause:java.net.ConnectException: Call From clientMachine#14/clientMachine#14IP to NameNode:9000 failed on connection exception: java.net.ConnectException: Connection timed out; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2019-05-03 13:18:50,473 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From clientMachine#14/clientMachine#14IP to NameNode:9000 failed on connection exception: java.net.ConnectException: Connection timed out; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1508)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:268)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1324)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1311)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1299)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:315)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:280)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:267)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1630)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:339)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:335)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:335)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:784)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:356)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:760)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:648)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:744)
at org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557)
at org.apache.hadoop.ipc.Client.call(Client.java:1480)
... 31 more Datanode log on the failed client machine#14 around the failed time: 886389:2019-05-03 13:17:36,651 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: clientMachinej#14:50010:DataXceiver error processing WRITE_BLOCK operation src: /clientIP:36531 dst: /datanode#10IP:50010; org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1669201578-10.202.33.149-1549049450582:blk_1155667125_83195343 already exists in state FINALIZED and thus cannot be created.
892139:2019-05-03 13:18:22,989 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(clientIP, datanodeUuid=3e35a5fc-4e9b-4548-8631-ab85b5885007, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-1f8081e6-d4ed-437d-8273-1f2a5edd4057;nsid=338758850;c=0):Exception writing BP-1669201578-10.202.33.149-1549049450582:blk_1155669671_83197947 to mirror datanode#10IP:50010
892140:java.io.IOException: Connection reset by peer
894608:java.io.EOFException: Premature EOF: no length prefix available
894613:2019-05-03 13:18:49,212 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-1669201578-10.202.33.149-1549049450582:blk_1155671740_83200091
894614:java.io.IOException: Premature EOF from inputStream
894626:2019-05-03 13:18:49,212 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver.run():
894627:java.nio.channels.ClosedByInterruptException
894642:java.nio.channels.ClosedByInterruptException
894657:2019-05-03 13:18:49,212 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1669201578-10.202.33.149-1549049450582:blk_1155671740_83200091 received exception java.io.IOException: Premature EOF from inputStream
894659:java.io.IOException: Premature EOF from inputStream
894674:java.io.EOFException: Premature EOF: no length prefix available
894679:2019-05-03 13:18:49,227 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver.run():
894680:java.io.IOException: Connection reset by peer
894698:java.io.IOException: Connection reset by peer
894716:2019-05-03 13:18:49,228 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-1669201578-10.202.33.149-1549049450582:blk_1155671508_83199858
894717:java.nio.channels.ClosedByInterruptException
894739:2019-05-03 13:18:49,228 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1669201578-10.202.33.149-1549049450582:blk_1155671508_83199858 received exception java.nio.channels.ClosedByInterruptException
894741:java.nio.channels.ClosedByInterruptException
894765:2019-05-03 13:18:49,369 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-1669201578-10.202.33.149-1549049450582:blk_1155671603_83199953
894766:java.io.IOException: Premature EOF from inputStream
894780:2019-05-03 13:18:49,369 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1669201578-10.202.33.149-1549049450582:blk_1155671603_83199953 received exception java.io.IOException: Premature EOF from inputStream
894782:java.io.IOException: Premature EOF from inputStream
894796:2019-05-03 13:18:49,532 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-1669201578-10.202.33.149-1549049450582:blk_1155671632_83199982
894797:java.io.IOException: Premature EOF from inputStream
894810:java.io.EOFException: Premature EOF: no length prefix available
894817:2019-05-03 13:18:49,532 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1669201578-10.202.33.149-1549049450582:blk_1155671632_83199982 received exception java.io.IOException: Premature EOF from inputStream
894819:java.io.IOException: Premature EOF from inputStream There was some GC pause on namenode machine#9, but fixed by increase the Heap Memory. 129,238,586 files and directories, 31,709,865 blocks = 160,948,451 total filesystem object(s).
Heap Memory used 33.34 GB of 101.15 GB Heap Memory. Max Heap Memory is 101.15 GB. On the machine #9 datanode about the same time: 2019-05-03 13:18:42,146 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1669201578-10.202.33.149-1549049450582:blk_1155671398_83199748, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2019-05-03 13:18:42,148 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1669201578-10.202.33.149-1549049450582:blk_1155671637_83199987, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2019-05-03 13:18:42,149 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1669201578-10.202.33.149-1549049450582:blk_1155671717_83200068 src: /machine#12:33498 dest: /machine#9:50010
2019-05-03 13:18:42,492 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 1096ms (threshold=300ms)
2019-05-03 13:18:42,521 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 1163ms (threshold=300ms)
2019-05-03 13:18:54,592 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 508ms (threshold=300ms)
2019-05-03 13:18:55,892 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 14349ms (threshold=300ms)
2019-05-03 13:18:55,734 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 338ms (threshold=300ms)
2019-05-03 13:18:55,404 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 585ms (threshold=300ms)
2019-05-03 13:18:55,352 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 13225ms (threshold=300ms)
2019-05-03 13:18:55,094 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 501ms (threshold=300ms)
2019-05-03 13:18:55,223 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 2067ms (threshold=300ms)
2019-05-03 13:18:56,730 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 4850ms No GCs detected 2019-05-03 13:18:55,005 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 1025ms (threshold=300ms) 2019-05-03 13:18:54,957 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 850ms (threshold=300ms) 2019-05-03 13:18:54,796 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 12170ms (threshold=300ms) 2019-05-03 13:18:54,634 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 504ms (threshold=300ms) 2019-05-03 13:18:56,107 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 13561ms (threshold=300ms) 2019-05-03 13:18:57,012 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow PacketResponder send ack to upstream took 1512ms (threshold=300ms), PacketResponder: BP-1669201578-10.202.33.149-1549049450582:blk_1155670553_83198851, type=HAS_DOWNSTREAM_IN_PIPELINE, replyAck=seqno: 1704 reply: 0 reply: 0 downstreamAckTimeNanos: 9451006117 2019-05-03 13:18:57,036 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow PacketResponder send ack to upstream took 392ms (threshold=300ms), PacketResponder: BP-1669201578-10.202.33.149-1549049450582:blk_1155671880_83200232, type=LAST_IN_PIPELINE, downstreams=0:[], replyAck=seqno: 67 reply: 0 downstreamAckTimeNanos: 0 2019-05-03 13:18:57,068 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow PacketResponder send ack to upstream took 600ms (threshold=300ms), PacketResponder: BP-1669201578-10.202.33.149-1549049450582:blk_1155671512_83199862, type=HAS_DOWNSTREAM_IN_PIPELINE, replyAck=seqno: 19 reply: 0 reply: 0 reply: 0 downstreamAckTimeNanos: 11402140530 2019-05-03 13:18:57,066 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Slow PacketResponder send ack to upstream took 2563ms (threshold=300ms), PacketResponder: BP-1669201578-10.202.33.149-1549049450582:blk_1155671711_83200062, type=LAST_IN_PIPELINE, downstreams=0:[], replyAck=seqno: 153 reply: 0 downstreamAckTimeNanos: 0
... View more