Created on 02-24-2017 03:51 AM - edited 09-16-2022 04:09 AM
Can you please so kind to let me know the options for migrating data and scripts from Cloudera to Hortonworks.
1) What are the options/tools available for Data migration from Cloudera to Hortonworks Platform?
2) What are the tools/options available for copying the scripts (Hive, Pig etc) from Cloudera to Hortonworks Platform?
3) What are the tools/options available for copying important properties (Yarn, HDFS, Hive etc) from Cloudera to Hortonworks Platform?
4) What are the tools/options available for copying Rules/Policies (like Ranger) from Cloudera to Hortonworks?
Thanks in advance
Created 02-24-2017 02:29 PM
These topics can be quite involved, I am going to err on the pithy side....
1. A CDH -> HDP migration can either be a cluster takeover or a Cluster to New Cluster migration. A takeover can be similar to a HDP X to HDP Y upgrade.
Distcp is often used for a Cluster -> New Cluster migration.
2. Hive, Pig, et al should work in the same manner as CDH and HDP – assuming you are migrating to the
same version, or a backward compatible version. Hive and Pig are not tied to one distro.
3. HDP is installed with a complete set of site files managed by default by Ambari. If there are customized parameters,
or CDH specifics, I would use a compare tool (diff, etc.) to examine the differences between HDP and CDH.
HDP will also provide a guided configuration (i.e. tuning recommendations). Better to start with the HDP guided
configs, check the differences, and apply them in a controlled manner.
4. Ranger is less likely to be part of a CDH distro. Ranger is part of HDP by default. Policies are stored in the Ranger DB.
Should work if they are defined on Open components. Would advise testing, etc. Keep in mind there are many options
to do some quick tests – HDP Sandbox, HDC in AWS, HDP in Azure, etc.
Would also advise you understand the use cases/workloads/data on your existing cluster, and ensure you map to the same functionality in HDP. Usually many of the components are the same (i.e. HBase, Hive, Spark).
Created 02-24-2017 02:29 PM
These topics can be quite involved, I am going to err on the pithy side....
1. A CDH -> HDP migration can either be a cluster takeover or a Cluster to New Cluster migration. A takeover can be similar to a HDP X to HDP Y upgrade.
Distcp is often used for a Cluster -> New Cluster migration.
2. Hive, Pig, et al should work in the same manner as CDH and HDP – assuming you are migrating to the
same version, or a backward compatible version. Hive and Pig are not tied to one distro.
3. HDP is installed with a complete set of site files managed by default by Ambari. If there are customized parameters,
or CDH specifics, I would use a compare tool (diff, etc.) to examine the differences between HDP and CDH.
HDP will also provide a guided configuration (i.e. tuning recommendations). Better to start with the HDP guided
configs, check the differences, and apply them in a controlled manner.
4. Ranger is less likely to be part of a CDH distro. Ranger is part of HDP by default. Policies are stored in the Ranger DB.
Should work if they are defined on Open components. Would advise testing, etc. Keep in mind there are many options
to do some quick tests – HDP Sandbox, HDC in AWS, HDP in Azure, etc.
Would also advise you understand the use cases/workloads/data on your existing cluster, and ensure you map to the same functionality in HDP. Usually many of the components are the same (i.e. HBase, Hive, Spark).
Created 02-28-2017 04:26 AM
Thank you Mr. Grahan for your valuable inputs