Cloudera Data Engineering (CDE) now supports multi-version Spark pipelines. Users can easily test and promote Spark 2 workloads to Spark 3 to take advantage of the performance and stability improvement in the latest version of Spark. (Performance improvement of over 30% based on internal TPC-DS benchmarks)
Data engineers can run workloads in both Spark 2 and Spark 3 within the same CDP PC environment, therefore maintaining backwards compatibility with legacy workloads while developing new applications on the latest version of Spark. Administrators have a new option within the Virtual Cluster creation wizard to choose a Spark version. Once up and running, users can seamlessly transition to deploying their Spark 3 jobs through the same UI and CLI/API as before, with comprehensive monitoring of their pipelines including real-time logs and Spark UI.
Future release, will include the visual performance profiler into Spark 3 job run details.
To learn more, visit the documentation.