Hi Cloudera Community,
I hate to start a new topic, but I cannot seem to find a meaningful answer to this problem (I am encountering it for the second time now). Also, sorry for the rather long post.
So, in short, I (ops) am upgrading a prelive cluster from cdh4.6 to cdh5.7 (hosted on wheezy). The reason we are going with cdh5.7 is because this is a multi-phase project, and QA upgrade was tested with 5.7. Configs and packages are managed by puppet, so they are relatively clean and consistent.
I was following these steps https://www.cloudera.com/documentation/enterprise/5-7-x/topics/cdh_ig_earlier_cdh5_upgrade.html and came to the step of hdfs upgrade:
sudo service hadoop-hdfs-namenode upgrade
Look for a line that confirms the upgrade is complete, such as: /var/lib/hadoop-hdfs/cache/hadoop/dfs/<name> is complete. The NameNode upgrade process can take a while, depending on the number of files.
So, the command runs without error, and exits (like I expect from a init script)
Well, first of all, I've never found this "is completed" message in the log file, or any similar message, when I was doing this upgrade for the first time (on qa cluster).
Second of all, the namenode keeps running during (and after) this process (without the web interface, ofc), so at some point my datanodes start to connect, so I basically have to stop the namenode before starting it regularly.
So, it might be that the cloudera steps are a bit misleading?
0) does it hurt that the datanodes are running? Should it be only namenode(s) and journalnodes?
1) when is it safe to stop the namenode? In other words, when is the operation "done"?
After I see which of these in the namenode log:
INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Starting upgrade of local storage directories. old LV = -40; old CTime = 0. new LV = -60; new CTime = XXXX INFO org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Starting upgrade of storage directory XXXX INFO org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector: No version file in XXXX INFO org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Performing upgrade of storage directory XXXX INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? false (staleImage=false, haEnabled=true, isRollingUpgrade=false) INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at XXXX
I also had these messages on all the journalnodes:
INFO org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Starting upgrade of storage directory XXX INFO org.apache.hadoop.hdfs.qjournal.server.Journal: Starting upgrade of edits directory: . old LV = -40; old CTime = 0. new LV = -60; new CTime = XXXXXXX INFO org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Performing upgrade of storage directory XXX
2) should I be concerned about this line:
INFO org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector: No version file in FolderX
(this was already asked by a forum member, but there was no answer)
3) I have checked the source from "NNUpgradeUtil.java" for both cdh5.7 and latest, and they both indicate that it should be done if there are both "current" and "previous" folders in your data folder is, since the last step is the rename of the tmp to previous. Please note that I didn't finalize the change yet, and I don't want to, because at this stage I want to be able to rollback if needed.
4) my setup was HA-enabled before (in 4.6), and I came across these instructions from Apache foundation website:
Now, is this applicable in my case? And if it is, why isn't this way mentioned in the Cloudera docs?
For example, their "way" offers a way to check for status of the upgrade, and that obviously fails with what I did, since it's not a "rolling upgrade".