Support Questions

Find answers, ask questions, and share your expertise

Migrating Data and Scripts From Cloudera To Hortonworks

avatar
Explorer

Can you please so kind to let me know the options for migrating data and scripts from Cloudera to Hortonworks.

1) What are the options/tools available for Data migration from Cloudera to Hortonworks Platform?

2) What are the tools/options available for copying the scripts (Hive, Pig etc) from Cloudera to Hortonworks Platform?

3) What are the tools/options available for copying important properties (Yarn, HDFS, Hive etc) from Cloudera to Hortonworks Platform?

4) What are the tools/options available for copying Rules/Policies (like Ranger) from Cloudera to Hortonworks?

Thanks in advance

1 ACCEPTED SOLUTION

avatar
Expert Contributor

These topics can be quite involved, I am going to err on the pithy side....

1. A CDH -> HDP migration can either be a cluster takeover or a Cluster to New Cluster migration. A takeover can be similar to a HDP X to HDP Y upgrade.

Distcp is often used for a Cluster -> New Cluster migration.

2. Hive, Pig, et al should work in the same manner as CDH and HDP – assuming you are migrating to the

same version, or a backward compatible version. Hive and Pig are not tied to one distro.

3. HDP is installed with a complete set of site files managed by default by Ambari. If there are customized parameters,

or CDH specifics, I would use a compare tool (diff, etc.) to examine the differences between HDP and CDH.

HDP will also provide a guided configuration (i.e. tuning recommendations). Better to start with the HDP guided

configs, check the differences, and apply them in a controlled manner.

4. Ranger is less likely to be part of a CDH distro. Ranger is part of HDP by default. Policies are stored in the Ranger DB.

Should work if they are defined on Open components. Would advise testing, etc. Keep in mind there are many options

to do some quick tests – HDP Sandbox, HDC in AWS, HDP in Azure, etc.

Would also advise you understand the use cases/workloads/data on your existing cluster, and ensure you map to the same functionality in HDP. Usually many of the components are the same (i.e. HBase, Hive, Spark).

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

These topics can be quite involved, I am going to err on the pithy side....

1. A CDH -> HDP migration can either be a cluster takeover or a Cluster to New Cluster migration. A takeover can be similar to a HDP X to HDP Y upgrade.

Distcp is often used for a Cluster -> New Cluster migration.

2. Hive, Pig, et al should work in the same manner as CDH and HDP – assuming you are migrating to the

same version, or a backward compatible version. Hive and Pig are not tied to one distro.

3. HDP is installed with a complete set of site files managed by default by Ambari. If there are customized parameters,

or CDH specifics, I would use a compare tool (diff, etc.) to examine the differences between HDP and CDH.

HDP will also provide a guided configuration (i.e. tuning recommendations). Better to start with the HDP guided

configs, check the differences, and apply them in a controlled manner.

4. Ranger is less likely to be part of a CDH distro. Ranger is part of HDP by default. Policies are stored in the Ranger DB.

Should work if they are defined on Open components. Would advise testing, etc. Keep in mind there are many options

to do some quick tests – HDP Sandbox, HDC in AWS, HDP in Azure, etc.

Would also advise you understand the use cases/workloads/data on your existing cluster, and ensure you map to the same functionality in HDP. Usually many of the components are the same (i.e. HBase, Hive, Spark).

avatar
Explorer

Thank you Mr. Grahan for your valuable inputs