Created 06-28-2016 08:59 AM
Hello,
I am seeing an issue with fsimage files not being cleaned away from one of the "dfs.namenode.name.dir" directories. The setting of "dfs.namenode.name.dir" in our cluster is "/tmp/hadoop/hdfs/namenode,/var/hadoop/hdfs/namenode,/mnt/data/hadoop/hdfs/namenode". This fills up the /tmp partition on the host hosting the namenode.
Listing the contents of these folders show that the /tmp folder contains a lot more fsimage files than the other two folders:
[me@node ~]$ ls -la /tmp/hadoop/hdfs/namenode/current | grep fsimage | wc -l 94 [me@node ~]$ ls -la /var/hadoop/hdfs/namenode/current | grep fsimage | wc -l 9 [me@node ~]$ ls -la /mnt/data/hadoop/hdfs/namenode/current | grep fsimage | wc -l 9
Looking at the namenode logs confirms that the purging seems to only happen for /var and /mnt:
[me@node ~]$ grep NNStorageRetentionManager /var/log/hadoop/hdfs/hadoop-hdfs-namenode-node.log* | grep fsimage/var/log/hadoop/hdfs/hadoop-hdfs-namenode-node.log.7:2016-06-27 19:50:25,462 INFO namenode.NNStorageRetentionManager (NNStorageRetentionManager.java:purgeImage(225)) - Purging old image FSImageFile(file=/var/hadoop/hdfs/namenode/current/fsimage_0000000002281385227, cpktTxId=0000000002281385227)/var/log/hadoop/hdfs/hadoop-hdfs-namenode-node.log.7:2016-06-27 19:50:25,640 INFO namenode.NNStorageRetentionManager (NNStorageRetentionManager.java:purgeImage(225)) - Purging old image FSImageFile(file=/mnt/data/hadoop/hdfs/namenode/current/fsimage_0000000002281385227, cpktTxId=0000000002281385227)/var/log/hadoop/hdfs/hadoop-hdfs-namenode-node.log.8:2016-06-27 18:38:58,921 INFO namenode.NNStorageRetentionManager (NNStorageRetentionManager.java:purgeImage(225)) - Purging old image FSImageFile(file=/var/hadoop/hdfs/namenode/current/fsimage_0000000002280372072, cpktTxId=0000000002280372072)/var/log/hadoop/hdfs/hadoop-hdfs-namenode-node.log.8:2016-06-27 18:38:59,102 INFO namenode.NNStorageRetentionManager (NNStorageRetentionManager.java:purgeImage(225)) - Purging old image FSImageFile(file=/mnt/data/hadoop/hdfs/namenode/current/fsimage_0000000002280372072, cpktTxId=0000000002280372072)/var/log/hadoop/hdfs/hadoop-hdfs-namenode-node.log.9:2016-06-27 17:34:31,800 INFO namenode.NNStorageRetentionManager (NNStorageRetentionManager.java:purgeImage(225)) - Purging old image FSImageFile(file=/var/hadoop/hdfs/namenode/current/fsimage_0000000002279353884, cpktTxId=0000000002279353884)/var/log/hadoop/hdfs/hadoop-hdfs-namenode-node.log.9:2016-06-27 17:34:31,992 INFO namenode.NNStorageRetentionManager (NNStorageRetentionManager.java:purgeImage(225)) - Purging old image FSImageFile(file=/mnt/data/hadoop/hdfs/namenode/current/fsimage_0000000002279353884, cpktTxId=0000000002279353884)
Can anyone explain why only two directories are purged?
I should mention that we are running namenode HA.
Best Regards
/Thomas
Created 07-03-2016 06:52 AM
storing fsimage in /tmp makes no sense, I would remove that directory from your hdfs-site. You need multiple directories for redundancy whereas anything in tmp will disappear as soon as machine reboots. tmp directory does not operate the same way as other directories and it not purging files same way as others do is irrelevant
Created 07-03-2016 06:52 AM
storing fsimage in /tmp makes no sense, I would remove that directory from your hdfs-site. You need multiple directories for redundancy whereas anything in tmp will disappear as soon as machine reboots. tmp directory does not operate the same way as other directories and it not purging files same way as others do is irrelevant
Created 07-04-2016 06:43 AM
Hi Artem.
I agree that /tmp is just plain wrong for this. I think Ambari chose these directories for us during cluster installation and we haven't noticed. We will remove /tmp from this configuration.