Reply
Explorer
Posts: 16
Registered: ‎07-18-2016

cdh4.6 to cdh5.7 - hdfs upgrade result

Hi Cloudera Community,

 

I hate to start a new topic, but I cannot seem to find a meaningful answer to this problem (I am encountering it for the second time now). Also, sorry for the rather long post.

 

So, in short, I (ops) am upgrading a prelive cluster from cdh4.6 to cdh5.7 (hosted on wheezy). The reason we are going with cdh5.7 is because this is a multi-phase project, and QA upgrade was tested with 5.7. Configs and packages are managed by puppet, so they are relatively clean and consistent.

I was following these steps https://www.cloudera.com/documentation/enterprise/5-7-x/topics/cdh_ig_earlier_cdh5_upgrade.html and came to the step of hdfs upgrade:

sudo service hadoop-hdfs-namenode upgrade

Instructions say:

Look for a line that confirms the upgrade is complete, such as: /var/lib/hadoop-hdfs/cache/hadoop/dfs/<name> is complete.

The NameNode upgrade process can take a while, depending on the number of files.

 

So, the command runs without error, and exits (like I expect from a init script)

Well, first of all, I've never found this "is completed" message in the log file, or any similar message, when I was doing this upgrade for the first time (on qa cluster).

Second of all, the namenode keeps running during (and after) this process (without the web interface, ofc), so at some point my datanodes start to connect, so I basically have to stop the namenode before starting it regularly.

So, it might be that the cloudera steps are a bit misleading?

 

My questions:

0) does it hurt that the datanodes are running? Should it be only namenode(s) and journalnodes?

1) when is it safe to stop the namenode? In other words, when is the operation "done"?

After I see which of these in the namenode log:

 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Starting upgrade of local storage directories.
   old LV = -40; old CTime = 0.
   new LV = -60; new CTime = XXXX
 INFO org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Starting upgrade of storage directory  XXXX
 INFO org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector: No version file in  XXXX
 INFO org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Performing upgrade of storage directory  XXXX
 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? false (staleImage=false, haEnabled=true, isRollingUpgrade=false)
 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at XXXX

I also had these messages on all the journalnodes:

INFO org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Starting upgrade of storage directory XXX
INFO org.apache.hadoop.hdfs.qjournal.server.Journal: Starting upgrade of edits directory: .
   old LV = -40; old CTime = 0.
   new LV = -60; new CTime = XXXXXXX
INFO org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Performing upgrade of storage directory XXX

2) should I be concerned about this line:

INFO org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector: No version file in FolderX

(this was already asked by a forum member, but there was no answer)

 

3) I have checked the source from "NNUpgradeUtil.java" for both cdh5.7 and latest, and they both indicate that it should be done if there are both "current" and "previous" folders in your data folder is, since the last step is the rename of the tmp to previous. Please note that I didn't finalize the change yet, and I don't want to, because at this stage I want to be able to rollback if needed.

 

4) my setup was HA-enabled before (in 4.6), and I came across these instructions from Apache foundation website:

https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html

Now, is this applicable in my case? And if it is, why isn't this way mentioned in the Cloudera docs?

For example, their "way" offers a way to check for status of the upgrade, and that obviously fails with what I did, since it's not a "rolling upgrade".

 

Regards,

Milan

Explorer
Posts: 16
Registered: ‎07-18-2016

Re: cdh4.6 to cdh5.7 - hdfs upgrade result

update: after some time, I opted to proceed with the standby namenode, and that went fast, without problems. Then I checked the web interface on the standby namenode, it was fine, and then I discovered that primary namenode started the web interface as well, at some point between my post and this reply. Unfortunately, nothing in the log indicates the exact time..
I'll just assume that it is safe to proceed with the standby namenode after waiting for 3-5 mins :-/
Highlighted
Explorer
Posts: 16
Registered: ‎07-18-2016

Re: cdh4.6 to cdh5.7 - hdfs upgrade result

I guess I spoke too fast - on first attempt of "hadoop fs -ls", both namenodes ended up in a "standby" state..
Explorer
Posts: 16
Registered: ‎07-18-2016

Re: cdh4.6 to cdh5.7 - hdfs upgrade result

update to the current state: standby-standby state was caused by a corrupt state in zookeeper, and that was fixed by re-initializing the zookeeper state. Other questions still remain.
Announcements