I'm looking to upgrade a 700 plus node producation cluster from CDH 5.5.1 to atleast CDH 5.8.X ( Preferably the latest CDH 5 release - both CM and CDH). I'm trying to get an idea on what's the most stable minor version, something that equates Apache Hadoop 2.7 or 2.9 with the patches and features since CDH 5 translates to 2.6. Usage is MR heavy with Spark and hive gaining traction. Any inputs would be great. Thanks in advance 🙂
@Hobster The Spark version changed between 5.5 to 5.7, and here some effort is needed.
If you solved this then you can go Up to the latest CDH and this is only my opnion, in my case i upgraded from CDH 5.5.4 to CDH 5.13.0 and the only issue i faced the spark one, and this week planning to upgarde to CDH 5.16.1 with no effort at all.
Thank you so much for sharing that. Definitely Spark would have been an area of concern for us. We actually used a custom install (Spark 2.2) and used the CM to add the 1.6 packages, then overwrote te configs to point to the newer version so that worked out fine so we are using spark 2 which comes with the higher CDH versions (CSD).
I am leaning towards 5.16.1 as well as of now since most of the known issues and bugs have been addressed in that release.