Created on 07-04-2019 12:16 PM - edited 09-16-2022 07:29 AM
Hello,
I have a problem ramdonly, almost every day with many diffferent jobs, they have killed after a long time of running.
It's very usual to see in the logs of MAP this.
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.21.0.47:28703 remote=bda2node09.sii.cl/172.21.0.27:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102) at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:207) at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:156) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:788) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:844) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:904) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:954) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:186) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:562) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 2019-07-04 02:18:43,776 WARN [main] org.apache.hadoop.hdfs.DFSClient: Could not obtain block: BP-1064157840-172.21.0.1-1532459013851:blk_1081445547_7704929 file=/user/iecv/input/resumen_detalle_otro_impuesto/iecv_detalle_otro_impuesto_dtoi/IECV_DETALLE_OTRO_IMPUESTO_DTOI_2015-07-02_03-38.txt No live nodes contain current block Block locations: DatanodeInfoWithStorage[172.21.0.6:50010,DS-ecdc373e-e4e5-4300-bd4e-049d09bf06e1,DISK] DatanodeInfoWithStorage[172.21.0.49:50010,DS-dcc8471c-1369-4aa0-8da0-31b20e36f0ca,DISK] DatanodeInfoWithStorage[172.21.0.27:50010,DS-4a71be65-7863-412e-a76d-b7b989832b67,DISK] Dead nodes: DatanodeInfoWithStorage[172.21.0.6:50010,DS-ecdc373e-e4e5-4300-bd4e-049d09bf06e1,DISK] DatanodeInfoWithStorage[172.21.0.49:50010,DS-dcc8471c-1369-4aa0-8da0-31b20e36f0ca,DISK] DatanodeInfoWithStorage[172.21.0.27:50010,DS-4a71be65-7863-412e-a76d-b7b989832b67,DISK]. Throwing a BlockMissingException 2019-07-04 02:18:43,776 WARN [main] org.apache.hadoop.hdfs.DFSClient: DFS Read org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1064157840-172.21.0.1-1532459013851:blk_1081445547_7704929 file=/user/iecv/input/resumen_detalle_otro_impuesto/iecv_detalle_otro_impuesto_dtoi/IECV_DETALLE_OTRO_IMPUESTO_DTOI_2015-07-02_03-38.txt at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1040) at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1023) at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1002) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:642) at org.apache.hadoop.hdfs.DFSInputStream.seekToNewSource(DFSInputStream.java:1668) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:871) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:904) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:954) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:94) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:186) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:562) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
And in the reduces logs we can see this WARN
2019-07-04 00:10:00,848 WARN [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to shuffle for fetcher#4
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:562)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:348)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
2019-07-04 00:10:00,849 WARN [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to shuffle output of attempt_1560882595803_22947_m_013254_0 from bda1node05.sii.cl:13562
java.io.IOException: java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:566)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:348)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
Caused by: java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:562)
... 2 more
2019-07-04 00:10:00,849 WARN [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.Fetcher: copyMapOutput failed for tasks [attempt_1560882595803_22947_m_013254_0]
.....
2019-07-04 02:21:05,554 WARN [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#3 failed to read map headernull decomp: -1, -1
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3375)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3368)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3356)
at java.io.DataInputStream.readByte(DataInputStream.java:265)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
at org.apache.hadoop.io.WritableUtils.readStringSafely(WritableUtils.java:475)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleHeader.readFields(ShuffleHeader.java:66)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:509)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:348)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
Any ideas, what are the posible causes of these kind of issues?.
Thanks,
CF
Created 07-16-2019 07:54 PM