Support Questions

Find answers, ask questions, and share your expertise

hadoop nodes from SUSE to RHEL

avatar
Expert Contributor

What is the best approach to restore HDP cluster if a customer would like to migrate from SUSE OS on an existing HDP cluster to RHEL OS?

Is it same as re-installing OS/HDP cluster and restoring it with the backup data/config? please advise.

1 ACCEPTED SOLUTION

avatar
Master Guru

I am actually pretty sure that most backups will still work. Sure all the RPMs etc. will be different but lets go through it one by one:

a) HDFS data, should really not depend on the OS unless the data is switched from big to little endian or something.

b) Databases ( ambari, hive, oozie, ... )

Should also not depend on the OS. Depends on the database obviously but if you do an export import you should be fine. Simply copying the files over might be a different matter.

Now you would need to change the hostnames inside the backups, for hive that is a single location for the others it could be more complicated. Unless you migrate the hostnames 1:1

c) configs? I think the easiest way here would be blueprints. ( i.e. export one and setup the new cluster with it ) OR install clean and apply settings carefully. Which might be safer to make any modifications that are needed.

d) timeline store, spark history, etc. are most likely not needed to keep.

But yeah it might be safer to setup the new cluster and distcp the data over instead of copying the namenode/datanode folders over. However I really don't think the OS should affect them. ( Never did it though fair warning )

My tip would be try it it on a sandbox ( install a single node suse make a table, an oozie job and a couple files and then migrate everything )

View solution in original post

7 REPLIES 7

avatar

@rbalam

SUSE & RHEL repos/rpms are different, I don't think simple migration works here like upgrading Linux version from RHEL6 to RHEL7.

http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_Installing_HDP_AMB/content/_ambari_reposi...

avatar
Expert Contributor

@Divakar Annapureddy

Does this mean that we can't restore the cluster with the meta-data/data backups we take before the cluster re-install?

avatar

good question but I'm not a product engineer to comment on this. But normally we are seeing huge compatibles issues between same flavor of Linux Ex : Cent OS 6 & Cent OS 7

I think, your SUSE OS hadoop metadata backups doesn't work for RHEL OS.

avatar
Master Guru

I am actually pretty sure that most backups will still work. Sure all the RPMs etc. will be different but lets go through it one by one:

a) HDFS data, should really not depend on the OS unless the data is switched from big to little endian or something.

b) Databases ( ambari, hive, oozie, ... )

Should also not depend on the OS. Depends on the database obviously but if you do an export import you should be fine. Simply copying the files over might be a different matter.

Now you would need to change the hostnames inside the backups, for hive that is a single location for the others it could be more complicated. Unless you migrate the hostnames 1:1

c) configs? I think the easiest way here would be blueprints. ( i.e. export one and setup the new cluster with it ) OR install clean and apply settings carefully. Which might be safer to make any modifications that are needed.

d) timeline store, spark history, etc. are most likely not needed to keep.

But yeah it might be safer to setup the new cluster and distcp the data over instead of copying the namenode/datanode folders over. However I really don't think the OS should affect them. ( Never did it though fair warning )

My tip would be try it it on a sandbox ( install a single node suse make a table, an oozie job and a couple files and then migrate everything )

avatar

Thanks Benjamin for detailed explanation

avatar

As Benjamin said, strongly encourage you to establish your process with a small test cluster first. However, I do not expect problems with the data. Hadoop is written in Java, so the form of data should be same between operating systems, especially all Linux variants.

Warning: Do not upgrade both operating system and HDP version all at once! Change one major variable at a time, and make sure the system is stable in between. So go ahead and change OS, but keep the HDP version the same until you are done and satisfied with the state of the new OS.

The biggest potential gotcha is if you experience ClusterID mismatch as a result of your backup and restore process. If you are backing up the data by distcp-ing it between clusters, then this won't be an issue; the namespaceID/clusterID/blockpoolID probably will change, but it won't matter since distcp actually creates new files. But if you are trying to use traditional file-based backup and restore, from tape or a SAN, then you may experience this: After you think you've fully restored, and you try to start up HDFS it will tell you you need to format the file system, or the hdfs file system may simply appear empty despite the files all being back in place. If this happens, "ClusterID mismatch" is the first thing to check, starting with http://hortonworks.com/blog/hdfs-metadata-directories-explained/ for background. Won't say more because you probably won't have the problem and it will be confusing to talk about in the abstract.

avatar
Expert Contributor

@Matt FoleyThanks for additional information. This is very helpful