Every night at twelve o'clock Flume HDFS Sink throw the error "Error while syncing":
And it does not recover until a flume service restart is perform.
Anyone recognize this error?
Additionally, you might look a cron on the flume node or datanodes to see if there is a a process kicking off at midnight that is eating up all the system resources and causing timeouts (possibly backups or log file rotations)
Thanks for the reply.
Actually there are not any cron scheduled jobs at midnight. I don't think this is te cause.
We have change the following HDFS configuration:
An the error has change:
This error repeats over and over again until a service flume reset is performed.
Are you rolling files by day? Perhaps the act of closing/opening lots of file handles is causing an issue in your environment. Maybe you could change that behavior in flume if that's the issue for you...
In the flume log, after the initial error I quoted on my first message, we get the following error again and again, every few seconds, until a flume service restart is performed:
I don't think it is a ulimit problem. We first considered this cause and set a really high value for the user flume:
Looking at datanodes log, I saw the following entries:
A lot of thanks.