What's New @ Cloudera

Find the latest Cloudera product news

[Preview] Accelerate data pipelines by more than 30% with Apache Spark 3 in Cloudera Data Engineering (CDE)

avatar
Contributor

Cloudera Data Engineering (CDE) now supports multi-version Spark pipelines.  Users can easily test and promote Spark 2 workloads to Spark 3 to take advantage of the performance and stability improvement in the latest version of Spark.  (Performance improvement of over 30% based on internal TPC-DS benchmarks)

 

Data engineers can run workloads in both Spark 2 and Spark 3 within the same CDP PC environment, therefore maintaining backwards compatibility with legacy workloads while developing new applications on the latest version of Spark.  Administrators have a new option within the Virtual Cluster creation wizard to choose a Spark version.  Once up and running, users can seamlessly transition to deploying their Spark 3 jobs through the same UI and CLI/API as before, with comprehensive monitoring of their pipelines including real-time logs and Spark UI.

 

Future release, will include the visual performance profiler into Spark 3 job run details.

 

To learn more,  visit the documentation.