Created 02-03-2016 10:13 PM
We are constantly running out of space on the hadoop nodes.
Is it recommended to enable logging of hadoop logs to hdfs mounted as nfs on the data nodes?
Or is it better to mount a nap drive to the nodes for storing log files.
Is there any challenges?
Created 02-03-2016 11:15 PM
Hi @S Roy, using hdfs mounted as nfs would be a bad idea. An HDFS service writing its own logs to HDFS could deadlock on itself.
As @Neeraj Sabharwal suggested, a local disk is best to make sure the logging store does not become a performance bottleneck. You can change the log4j settings to limit the size and number of the log files thus capping total space used by log files. Also you can write a separate daemon to periodically copy log files to HDFS for long term archival.
Created 02-03-2016 10:16 PM
you can zip those logs once in a while @S Roy
Created 02-03-2016 10:53 PM
I always suggest to have dedicated log disk or mount of size around 200gb in each node
/usr/hdp for binaries around 50gb
/var/log 150 to 200gb
Created 02-03-2016 10:58 PM
What about NAS storage mounted ?
Created 02-03-2016 11:07 PM
I try to stay away from NAS because when it comes to performance then it can be a road block or misguide
Lab or demo ..sure
But not for prod
Created 02-03-2016 11:15 PM
Hi @S Roy, using hdfs mounted as nfs would be a bad idea. An HDFS service writing its own logs to HDFS could deadlock on itself.
As @Neeraj Sabharwal suggested, a local disk is best to make sure the logging store does not become a performance bottleneck. You can change the log4j settings to limit the size and number of the log files thus capping total space used by log files. Also you can write a separate daemon to periodically copy log files to HDFS for long term archival.