Created 06-30-2022 07:14 AM
Hello,
Is there a straight documentation available that would help to side car migrate Oozie jobs (50+) from HDP to CDP PB? I know of the properties file that might need some modification to point to CDP RM and relevant servers, however, facing it hard to understand / map properties file as we have 50+ workflows to be migrated.
Thanks
snm1523
Created 05-31-2024 10:34 AM
Hello,
The Oozie workflows are available in HDFS within CDH and HDP clusters. To migrate the Oozie workflows from these clusters to CDP SaaS involves a separate workflow process.
While using the HDP clusters, you can use the DistCp tool to migrate the Oozie workflows present in HDFS. While using the CDH clusters, you can employ the Replication Manager App to migrate the Oozie workflows present in HDFS
Specifically, some manual updates are required to process the Oozie migration created outside of the Hue Workflow. Before you proceed further, you must understand:
During the migration process you must copy across all Oozie job files (workflow.xml, job.properties, and any supporting JARs).
Which Oozie workflow files must be copied or migrated to CDP One. Identify the workflow.xml file and job.properties file for each Oozie workload that must be migrated. These files are stored in HDFS and must be copied to the CDP One endpoint.
The job.properties file must be updated with the appropriate CDP One endpoints.
Optionally, the workflow.xml file needs to be updated. For example, while currently using the legacy “hive action” requires an update to the newer “hive2 action”.
Depending on where (the location) you have stored your Oozie workflow data in HDFS, note the following information:
The workflow.xml and any job JAR files reside within HDFS in the source cluster. These will have to be copied across into a CDP One endpoint.
The job.properties file for a job contains a reference to the location of where the workflow files are stored. The job.properties file will need to be updated during a migration with the new target environment settings / locations.
Created 05-31-2024 10:34 AM
Hello,
The Oozie workflows are available in HDFS within CDH and HDP clusters. To migrate the Oozie workflows from these clusters to CDP SaaS involves a separate workflow process.
While using the HDP clusters, you can use the DistCp tool to migrate the Oozie workflows present in HDFS. While using the CDH clusters, you can employ the Replication Manager App to migrate the Oozie workflows present in HDFS
Specifically, some manual updates are required to process the Oozie migration created outside of the Hue Workflow. Before you proceed further, you must understand:
During the migration process you must copy across all Oozie job files (workflow.xml, job.properties, and any supporting JARs).
Which Oozie workflow files must be copied or migrated to CDP One. Identify the workflow.xml file and job.properties file for each Oozie workload that must be migrated. These files are stored in HDFS and must be copied to the CDP One endpoint.
The job.properties file must be updated with the appropriate CDP One endpoints.
Optionally, the workflow.xml file needs to be updated. For example, while currently using the legacy “hive action” requires an update to the newer “hive2 action”.
Depending on where (the location) you have stored your Oozie workflow data in HDFS, note the following information:
The workflow.xml and any job JAR files reside within HDFS in the source cluster. These will have to be copied across into a CDP One endpoint.
The job.properties file for a job contains a reference to the location of where the workflow files are stored. The job.properties file will need to be updated during a migration with the new target environment settings / locations.
Created 06-05-2024 02:30 AM
Thank you for the detailed explanation, @ShankerSharma. However, we ultimately had the engineering team along with developers who did this job. But I will keep this in my notes for reference.