Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Every night at twelve o'clock Flume HDFS Sink throw the error "Error while syncing"

Every night at twelve o'clock Flume HDFS Sink throw the error "Error while syncing"

New Contributor

Every night at twelve o'clock Flume HDFS Sink throw the error "Error while syncing":

 

Spoiler
2014-12-13 00:02:30,503 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing
java.io.IOException: All datanodes 172.22.65.144:50010 are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1127)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)

And it does not recover until a flume service restart is perform.

 

Anyone recognize this error?

 

Thanks,

6 REPLIES 6

Re: Every night at twelve o'clock Flume HDFS Sink throw the error "Error while syncing"

This is likely due to the load applied by Flume on HDFS. What is your file descriptor limit (ulimit -n)? Do you see any other errors in the datanode logs before this "all datanodes are bad" message?
Regards,
Gautam Gopalakrishnan

Re: Every night at twelve o'clock Flume HDFS Sink throw the error "Error while syncing"

Super Collaborator

Additionally, you might look a cron on the flume node or datanodes to see if there is a a process kicking off at midnight that is eating up all the system resources and causing timeouts (possibly backups or log file rotations)

Re: Every night at twelve o'clock Flume HDFS Sink throw the error "Error while syncing"

New Contributor

Thanks for the reply.

 

Actually there are not any cron scheduled jobs at midnight. I don't think this is te cause.

Re: Every night at twelve o'clock Flume HDFS Sink throw the error "Error while syncing"

New Contributor

We have change the following HDFS configuration:

 

<property>
<name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
<value>ALWAYS</value>
</property>

 

An the error has change:

 

Spoiler
2015-02-10 01:00:02,711 WARN org.apache.flume.sink.hdfs.HDFSEventSink: Exception while closing hdfs://hadmae1p:8020/user/flume/data/raw/common/performance/mem/dt=2015-02-10/172.22.65.142. Exception follows.
java.net.SocketTimeoutException: 70000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.22.65.142:47005 remote=/172.22.65.142:50010]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1985)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:1063)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1031)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1174)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)

 

This error repeats over and over again until a service flume reset is performed.

 

Re: Every night at twelve o'clock Flume HDFS Sink throw the error "Error while syncing"

New Contributor

Are you rolling files by day? Perhaps the act of closing/opening lots of file handles is causing an issue in your environment. Maybe you could change that behavior in flume if that's the issue for you...

Re: Every night at twelve o'clock Flume HDFS Sink throw the error "Error while syncing"

New Contributor

In the flume log, after the initial error I quoted on my first message, we get the following error again and again, every few seconds,  until a flume service restart is performed:

 

Spoiler
2015-02-04 00:02:46,787 WARN org.apache.flume.sink.hdfs.BucketWriter: Caught IOException writing to HDFSWriter (All datanodes 172.22.65.143:50010 are bad. Aborting...). Closing file (hdfs://hadmae2p:8020/user/flume/data/raw/common/performance/cpu/dt=2015-02-04/.172.22.65.143.1423004401036.tmp) and rethrowing exception.
2015-02-04 00:02:46,787 WARN org.apache.flume.sink.hdfs.BucketWriter: Caught IOException while closing file (hdfs://hadmae2p:8020/user/flume/data/raw/common/performance/cpu/dt=2015-02-04/.172.22.65.143.1423004401036.tmp). Exception follows.
java.io.IOException: All datanodes 172.22.65.143:50010 are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1127)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
2015-02-04 00:02:46,788 WARN org.apache.flume.sink.hdfs.HDFSEventSink: HDFS IO error
java.io.IOException: All datanodes 172.22.65.143:50010 are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1127)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
2015-02-04 00:02:50,788 ERROR org.apache.flume.sink.hdfs.AbstractHDFSWriter: Unexpected error while checking replication factor
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:147)
at org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:68)
at org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:505)
at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:440)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: All datanodes 172.22.65.143:50010 are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1127)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)

I don't think it is a ulimit problem. We first considered this cause and set a really high value for the user flume:

 

 

Spoiler
cat /etc/security/limits.conf
...
* soft nofile 640000
* hard nofile 640000
...

 

Looking at datanodes log, I saw the following entries:

 

Spoiler
2015-02-04 02:00:12,834 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: updateReplica: BP-1195530968-172.22.65.141-1398680394592:blk_1080861390_7152068, recoveryId=7153646, length=3247, replica=ReplicaUnderRecovery, blk_1080861390_7152068, RUR
getNumBytes() = 3375
getBytesOnDisk() = 3375
getVisibleLength()= 3375
getVolume() = /disk1/dfs/dn/current
getBlockFile() = /disk1/dfs/dn/current/BP-1195530968-172.22.65.141-1398680394592/current/rbw/blk_1080861390
recoveryId=7153646
original=ReplicaBeingWritten, blk_1080861390_7152068, RBW
getNumBytes() = 3375
getBytesOnDisk() = 3375
getVisibleLength()= 3375
getVolume() = /disk1/dfs/dn/current
getBlockFile() = /disk1/dfs/dn/current/BP-1195530968-172.22.65.141-1398680394592/current/rbw/blk_1080861390
bytesAcked=3375
bytesOnDisk=3375

2015-02-04 00:02:35,792 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-1195530968-172.22.65.141-1398680394592:blk_1080861390_7152068
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:711)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
at java.lang.Thread.run(Thread.java:744)

2015-02-04 00:02:35,797 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hadwrk04p.bancsabadell.com:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.22.65.145:54678 dest: /172.22.65.145:50010
java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:412)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:711)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
at java.lang.Thread.run(Thread.java:744)

2015-02-04 00:02:35,797 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: hadwrk04p.bancsabadell.com:50010:DataXceiver error processing WRITE_BLOCK operation src: /172.22.65.144:37883 dest: /172.22.65.145:50010
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)cat /
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:711)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
at java.lang.Thread.run(Thread.java:744)


2015-02-04 00:02:35,798 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1195530968-172.22.65.141-1398680394592:blk_1080861390_7152068, type=LAST_IN_PIPELINE, downstreams=0:[]
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1310)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1250)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1167)
at java.lang.Thread.run(Thread.java:744)


2015-02-04 00:02:35,799 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1195530968-172.22.65.141-1398680394592:blk_1080861390_7152068 received exception java.io.IOException: Premature EOF from inputStream

 

A lot of thanks.