We are currently doing a feasibility study on migrating from CDH to CDP wrt spark(currently in version 1.6).
When checked the documenation,it was understood that 1.6 is not supported ,we need to refactor it to 2.4 and the steps to do manually is given
But We are planning to migrate to Spark 3.x in CDP.In one of the cloudera blogs about the same(link below
As part of pre upgrade step ,it is mentioned that we need to convert Spark 1.x jobs to 2.4.5.
Phase 2: Pre-upgrade
- Backup existing cluster using the backup steps list here
- Confirm if all the prerequisites are addressed. Ensure all outstanding dependencies are met.
- Convert Spark 1.x jobs to Spark 2.4.5. Test and validate the jobs to ensure all the required code changes are performed and tested.
My doubt is :
If the migration is from Spark 1.x-3.x when moving from cdh to cdp,is it mandatory to have a step in between to convert spark 1x-2x and then 2x to 3,if yes then the refactoring of 1x-2x is automated or it should be done manually as the steps given in cloudera
Spark 1.6 to Spark 2.4 Refactoring
If not,can we directly refactor from spark 1x-3x when moving from CDH to CDP.Kindly confirm.
Thanks in advance.