Summary
Cloudera Data Engineering (CDE) 1.19 introduces interactive Spark sessions for development workflows to take advantage of autoscaling compute and orchestration capabilities that's hybrid and multi-cloud ready.
Since there is no one size fits all approach to development, CDE interactive sessions give data engineers flexible end-points to start developing Spark applications from anywhere -- in a web-based terminal, local CLI, favorite IDE, and even via JDBC from third-party tools.
CDE exposes sessions as first-class entities via the APIs, as well as the UI and CLI, allowing users to navigate seamlessly across interfaces. For example, initiate the session through the UI, start interacting with it in the web-based shell, then drop into your local terminal for a spark-shell experience.
Interactive Sessions Video
Complete Feature List:
Interactive Sessions (Tech Preview)
Both CLI and web based interactive shell sessions are now supported. Users can run Python, Scala, and Java in interactive mode for exploration, development, and testing.
Airflow performance
New Workload Regions Hong Kong and Jakarta are now supported
Addition of Spark 3.3
Moving forward, CDE will support multiple versions of Spark 3. Certain versions will be designated LTS to mirror PVC Base clusters to simplify migration - starting with Spark 3.2 LTS.
Note that Spark 3.3 is only supported on Data Lake 7.2.16 version.
Note that Spark 2.4 is now designated deprecated, and customers are encouraged to move to Spark 3 for better performance and longer support. Spark 2.4 will continue to receive security fixes but no new features.
Airflow support for file-based resources (Technical preview)
Airflow will now support mounting resources. In CDE 1.19, users will be able to mount file-based resources, future releases will extend this to include python libraries & virtual env.
This is in Technical Preview and available through the CLI.
Spark-submit migration tool
Additional Links