We are start planning to move our CDH cluster from 5.16.2 to CDH 6.3.2 and will happy to hear about the experience of anyone who made such move.
We are aware of the changes and i would divide them to applications and non applications dependant.
And here is some of my questions:
1) Is it recommended to do side by side upgrade or to do the upgrade on the same cluster?
Our cluster can tolerate downtime for few hours, but we are evaluating the safe rollback of the side by side, we know the cost of having side by side cluster, but indeed we are in.phase of renewing the servers, so we would like to take this as advantage in case we need to build new CDH 6 cluster and sync the cluster using distcp and perform cuttoff, what is the recommendations here?
2) We are aware of the Spark that we should use 2.4 and cannot have more than one version in the cluster where today we have 1.6 and 2.3 in our cluster, and we are building a plan to align our applications to use Spark 2.4, is there best practises or things we should take in considerations other than making the changes in the applications to use Spark 2.4?
3) We are aware of the changes in Hue, Impala and others but these are infra changes that we can apply and not related to the applications, is my claim right here?
4) Should we worry about the changes in the oozie? indeed we are in a process to move the rest of the oozie applications to airflow which we are using today in the cluster?
5) Will Mapreduce and HDFS be impacted by the upgrade?
I will be happy to hear from people who passed this upgrade as i know it's a challenging one.