Created 04-26-2016 03:53 AM
So, I was installing a new cluster for our development QA testing and failed to adjust the default configuration that Ambari gave me for dfs.datanode.data.dir. It ended up putting in every partition it found. I really only wanted one - /grid/1, which was a dedicated disk partition for HDFS block storage. I discovered this blunder after the complete installation completed. I did not want to just redo the whole install, opting instead to try to manually fix this as a good deep dive learning experience. I got it all worked out, and it was a good learning exercise, but I have one lingering issue that I cannot solve. I am getting 4 Ambari errors (one for each DN) that state:
Detected data dir(s) that became unmounted and are now writing to the root partition: /grid/1/hadoop/hdfs/data .
I figured out that Ambari agents monitor (and remember) which HDFS directories were previously mounted and it checks to see if a mounted disk goes away for some reason and displays an error. That's all fine - I get that. However, my setup seems to be correct yet Ambari is still complaining.
As part of correcting the configuration issue I laid upon myself, I did edit the following file (on each DN) to remove all those previously caches mount points that I did not want and I just left the one I did want. I ended up just stopping HDFS, removing all the /opt/hadoop/hdfs, //tmp/hadoop/hdfs, etc directories, removing the name node metadata directories, reformatting the namenode, and starting up HDFS. The file system is up an working.
But, can anyone tell me why I cannot get rid of this Ambari error?
Here's the contents of one of dfs_data_dir_mount.hist files. All 4 are exactly the same. Below showa the mount where I have a disk for HDFS data storage. It all looks good. I must be missing something obvious. I did restart everything - nothing clears this error.
Thanks in advance...
[root@vmwqsrqadn01 ~]# cat /var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist # This file keeps track of the last known mount-point for each DFS data dir. # It is safe to delete, since it will get regenerated the next time that the DataNode starts. # However, it is not advised to delete this file since Ambari may # re-create a DFS data dir that used to be mounted on a drive but is now mounted on the root. # Comments begin with a hash (#) symbol # data_dir,mount_point /grid/1/hadoop/hdfs/data,/grid/1 [root@vmwqsrqadn01 ~]# mount -l | grep grid /dev/sdb1 on /grid/1 type ext4 (rw,noatime,nodiratime,seclabel,data=ordered)
Created 04-26-2016 05:22 AM
We'll, I would still love to understand how this worked but, a reboot of the 4 DNs made this error go away. Never would have thunk it! That's pretty strange... Anyway, maybe this will help some other poor soul who hits this same condition. 🙂
Created 04-26-2016 05:22 AM
We'll, I would still love to understand how this worked but, a reboot of the 4 DNs made this error go away. Never would have thunk it! That's pretty strange... Anyway, maybe this will help some other poor soul who hits this same condition. 🙂
Created 04-26-2016 05:31 AM
Not sure, but I guess restarting ambari-agent's did it.
Created 04-27-2016 03:26 AM
Actually, I tried restarting them before the reboot. Restarted everything. Still had the errors. Then did the reboot and they cleared. Oh well, is what it is!
Created 05-26-2017 12:42 PM
I ran into this same issue, but unlike the original poster, restarting the Ambari agents on the data nodes was sufficient to clear the alarm.
Created 05-26-2017 01:04 PM
And that said, I actually restarted Ambari as well - so I can't say for certain that the agent restart was sufficient; it may well have been the agents plus Ambari which did the trick.