What's New @ Cloudera

Find the latest Cloudera product news

Cloudera Data Engineering (CDE) 1.19 in Public Cloud introduces interactive Spark development sessions

avatar
Contributor

Summary

Cloudera Data Engineering (CDE) 1.19 introduces interactive Spark sessions for development workflows to take advantage of autoscaling compute and orchestration capabilities that's hybrid  and multi-cloud ready.

 

Since there is no one size fits all approach to development, CDE interactive sessions give data engineers flexible end-points to start developing Spark applications from anywhere -- in a web-based terminal, local CLI, favorite IDE, and even via JDBC from third-party tools.

 

CDE exposes sessions as first-class entities via the APIs, as well as the UI and CLI, allowing users to navigate seamlessly across interfaces. For example, initiate the session through the UI, start interacting with it in the web-based shell, then drop into your local terminal for a spark-shell experience.

Interactive Sessions Video 

 

Complete Feature List:

 

  • Interactive Sessions (Tech Preview) 
    Both CLI and web based interactive shell sessions are now supported.  Users can run Python, Scala, and Java in interactive mode for exploration, development, and testing.

     
    image (34).png
     
  • Airflow performance 

    • In our latest benchmarks Airflow workloads run 2x faster on AWS, resulting from a combination of Airflow upgrades and continued optimizations 

  • New Workload Regions Hong Kong and Jakarta are now supported

  • Addition of Spark 3.3 

    • Moving forward, CDE will support multiple versions of Spark 3.  Certain versions will be designated LTS to mirror PVC Base clusters to simplify migration - starting with Spark 3.2 LTS.

    • Note that Spark 3.3 is only supported on Data Lake 7.2.16 version.

    • Note that Spark 2.4 is now designated deprecated, and customers are encouraged to move to Spark 3 for better performance and longer support.  Spark 2.4 will continue to receive security fixes but no new features.

       
      image (33).png
  • Airflow support for file-based resources (Technical preview)

    • Airflow will now support mounting resources.  In CDE 1.19,  users will be able to mount file-based resources, future releases will extend this to include python libraries & virtual env.

    • This is in Technical Preview and available through the CLI.

  • Spark-submit migration tool

    • The CLI translation tool is now available in the public cloud.  Customers can download and install on Datahub edge nodes to start migrating jobs from Spark on DH to Spark on CDE.

       
      image (35).png
  • Profiles for CDE CLI

    • Configure the CLI to easily toggle between different virtual clusters and CDE services.  

 

Additional Links

 

  • 1.19 Release notes can be found here

  • Pricing updates (note while in TP we will not charge the higher price).