I am trying to find instrictiions to upgrade to CDH4 from CDH3U6 bypassing cloudera manager completely. For that I need to create local yum rep on RedHat. Instructions at http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager-Instal... requires to install CDH4 and use parcels. I would like not to install it. Rather, I am looking for a Clouder CDH4 repo file which I would like to inside my firewall and then install all the RPMs from this repo. But I am unable to find the repo file. Do I need to download TAR file instead and see if that comes with REPO file inside of it. Has someone gone through this exercise before. I was hoping this should be straigthforward but loooks like it isn't. Please help.
The repos are at http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/. You can see the various CDH4 minor versions listed there, and clicking in you will find the repodata, etc. You shouldn't have a problem reposync'ing that to a local yum server.
I've done this migration myself, but in a much different way. We didn't reposync, and we basically spun up a brand new cluster with the new software then synchronized the data over using a modified distcp.
Have you seen this documentation? http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cd... Unless I'm mistaken, this should step you through an upgrade without CDM (and without parcels).
Thanks Bryan for the prompt reply. That should work for me I guess. I like the idea of creating a brand new cluster and then use distcp. But unfortunately, we don't have a new cluster which we can leverage. So I will have to do with other option. Did you have any major challenges during upgrade ? Did your map reduce code compile well under cdh4 ?
The migration went pretty smooth for us. However, keep in mind that the CDH4 clients can't talk to CDH3 servers, and vice versa, except through hftp. So you're going to need to coordinate the upgrade of library code and upgrade of the cluster. For us this involved quite a bit of work, since we can't afford to take much downtime for any part of our product. I hope for your sake this is not the case for you :)
We have a ton of custom infrastructure around build, deploy, etc so it's hard to give generic advice around that. If you have any specific questions I can try to answer as they come up though.
I am at a point where I am upgrading namenode using "sudo hadoop-hdfs-namenode upgrade". But it is compalining because it is conflicting with old Fsimage (CDH3). If I delete the old fsimage, I lose all my data. How can I safely upgrade. Please help. This is one step where I can't afford to be adeventurous :-)
I did not have to do this since I did a direct copy of data to a new cluster. A couple notes though:
1) You should only need the latest fsimage file. If an older one is causing issues, you could delete it as long as you still have another. Don't delete the edits files though.
2) You should make a copy of your fsimage and edits files, in fact the entire dfs.name.dir directory. This is recommended in general for falling back, but would allow you to do some playing safely.
I still may not be able to help much, but what was the full stacktrace, and some of the surrounding log lines for the failure?
Error says: InconsistentFSSTateException: /srv/namenode is in in inconsistent state. previous fs state should not exist during upgrade. Finalize or rollback first.
Now, my question is if I would be able to reconcile my fsimage later when I take back up.
Thanks Bryan. I was able to bring up CDH4. Trick was, as you said, to remove "previous" subdir.