At one point there was an incident that had caused the cluster to go down and act sporadically. When the cluster was brought back up, everything came up just fine except for the namenode. In the logs I was able to find this message:
INFO ipc.Server: IPC Server handler 0 on 8022, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.versionRequest from 10.0.0.184:54777 Call#955124 Retry#0: error: org.apache.hadoop.security.AccessControlException: Access denied for user hdfs. Superuser privilege is required
We also noticed that in the directory that stores namenode data, files are being written by root rather than user hdfs.
If I try to restart the namenode from within cloudera manager, I get the same results. Then if I try to exectue "hadoop namenode" as root on the namenode, I again get the same results. It is only when I "su" into user hdfs and execute "hadoop namenode" that the namenode successfully comes up.
Now I have been able to restart the namenode from Cloudera Manager without any problems on other hadoop clusters. So, my thought is that there must be some sort of configuration that is telling Cloudera Manager to exute scripts as root rather than hdfs.
Anyone have any insight to what may be going on over here? Maybe it's not even a Cloudera Manager issue? I'm not quite sure.
As it turns out, we had performed a namenode format as root when the crash had occured. Everyone seems to say that the proper way to perform the format is to sudo su - hdfs and then perform the namenode format. Could that be the root cause of the problems we are experiencing now?