The following process is assuming that you are installing your HDP cluster using a configuration management tool such as Ansible, Puppet or Chef and that you are deploying the cluster using Ambari blueprint.
If you are looking into automating your deployment you might be interested to use the great from @Alex Bush with Ansible.
- Migrating from HDP 2.2 to 2.4 using a full reinstallation
- Upgrading OS from RHEL 6 to RHEL 7
- OS boot from network and reinstallation of HDP
The following process has been tested to migrate from HDP 2.3 to HDP 2.4 on a kerberised cluster and to reinstall an HDP 2.3 cluster. It should also work for HDP 2.5 as HDFS version is consistent across those versions.
step 1:
- Make a backup of your metastore DB ( Ranger, Hive and Oozie ).
step 2:
- Check that the Namenode is having a folder called namenode-formatted under dfs.namenode.name.dir. If you are using namenode HA, you need to check on both namenode ( one will probably missing it )
step 3:
- Launch the reinstallation of your OS making sure that the disk / folder used by HDP to store data are not reinstalled. If you are deploying your OS using kickstart, you want to add the line --noformat to the disk concerned.
step 4:
- Grab a coffee whilst OS installation is taking place.
step 5:
- Following a successful OS installation, you can now launch your automated deployment of HDP. It doesn't matter if you also upgrade Ambari at the same time.
step 6:
- Grab a coffee whilst installation is taking place.
step 7:
- If your DB server has also been reinstalled as part of the process. You will need to stop the services ( hive, ranger and oozie ) and restore the DB.
(NB: Upon restart, the schema will automatically be upgraded if it's required)
On HDP 2.3
step 8:
- You should have all your services already available. If not, start them manually.
- Restart services for Hive, Ranger and Oozie.
step 9:
- Congratulate yourself for a smooth upgrade
On HDP 2.2
step 8
- Start HDFS manually
# Log as hdfs
su - hdfs
# Start all journalnodes
hdfs journalnode
# Start namenode in upgrade mode from command line
hdfs namenode -upgrade
# Start the second namenode
hdfs namenode -bootstrapStandby
step 9
- Start all services from ambari except for namenode. ( It should all start )
step 10
- Check that all your data are there and that you can access them (run a couple of known hive, hbase, ... query)
If everything is correct, move to step 11. You won't be able to return back so make sure everything is working as you expect.
step 11
- Finalize upgrade
# Log as HDFS
su - hdfs
# Run finalize command
hdfs dfsadmin -finalizeUpgrade
Finalize upgrade successful
Is this approach considered an in place upgrade of the OS? We need to upgrade to RHEL7 from RHEL6 and our system team doesn't use any configuration management tools to do an in place upgrade. It sounds like the systems/hosts in the cluster will be wiped to do the OS upgrade.
Do you have any information on how to do this and preserve the Hadoop data disks? We also need to upgrade from HDP 2.5.3 to 2.6.0 and our cluster is kerberized. What's the best approach for us to take in upgrading the OS and HDP simultaneously?