Support Questions

Mark_Petronic · ‎04-26-2016

So, I was installing a new cluster for our development QA testing and failed to adjust the default configuration that Ambari gave me for dfs.datanode.data.dir. It ended up putting in every partition it found. I really only wanted one - /grid/1, which was a dedicated disk partition for HDFS block storage. I discovered this blunder after the complete installation completed. I did not want to just redo the whole install, opting instead to try to manually fix this as a good deep dive learning experience. I got it all worked out, and it was a good learning exercise, but I have one lingering issue that I cannot solve. I am getting 4 Ambari errors (one for each DN) that state:

Detected data dir(s) that became unmounted and are now writing to the root partition: /grid/1/hadoop/hdfs/data .

I figured out that Ambari agents monitor (and remember) which HDFS directories were previously mounted and it checks to see if a mounted disk goes away for some reason and displays an error. That's all fine - I get that. However, my setup seems to be correct yet Ambari is still complaining.

As part of correcting the configuration issue I laid upon myself, I did edit the following file (on each DN) to remove all those previously caches mount points that I did not want and I just left the one I did want. I ended up just stopping HDFS, removing all the /opt/hadoop/hdfs, //tmp/hadoop/hdfs, etc directories, removing the name node metadata directories, reformatting the namenode, and starting up HDFS. The file system is up an working.

But, can anyone tell me why I cannot get rid of this Ambari error?

Here's the contents of one of dfs_data_dir_mount.hist files. All 4 are exactly the same. Below showa the mount where I have a disk for HDFS data storage. It all looks good. I must be missing something obvious. I did restart everything - nothing clears this error.

Thanks in advance...

[root@vmwqsrqadn01 ~]# cat /var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist
# This file keeps track of the last known mount-point for each DFS data dir.
# It is safe to delete, since it will get regenerated the next time that the DataNode starts.
# However, it is not advised to delete this file since Ambari may
# re-create a DFS data dir that used to be mounted on a drive but is now mounted on the root.
# Comments begin with a hash (#) symbol
# data_dir,mount_point
/grid/1/hadoop/hdfs/data,/grid/1

[root@vmwqsrqadn01 ~]# mount -l | grep grid
/dev/sdb1 on /grid/1 type ext4 (rw,noatime,nodiratime,seclabel,data=ordered)

Mark_Petronic · ‎04-26-2016

We'll, I would still love to understand how this worked but, a reboot of the 4 DNs made this error go away. Never would have thunk it! That's pretty strange... Anyway, maybe this will help some other poor soul who hits this same condition. 🙂

View solution in original post

Mark_Petronic · ‎04-26-2016

We'll, I would still love to understand how this worked but, a reboot of the 4 DNs made this error go away. Never would have thunk it! That's pretty strange... Anyway, maybe this will help some other poor soul who hits this same condition. 🙂

pminovic · ‎04-26-2016

Not sure, but I guess restarting ambari-agent's did it.

Mark_Petronic · ‎04-27-2016

Actually, I tried restarting them before the reboot. Restarted everything. Still had the errors. Then did the reboot and they cleared. Oh well, is what it is!

jarnold · ‎05-26-2017

I ran into this same issue, but unlike the original poster, restarting the Ambari agents on the data nodes was sufficient to clear the alarm.

jarnold · ‎05-26-2017

And that said, I actually restarted Ambari as well - so I can't say for certain that the agent restart was sufficient; it may well have been the agents plus Ambari which did the trick.