Product Announcements

Find the latest product announcements and version updates
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

[ANNOUNCE] CDS 2.3 Release 2 Powered by Apache Spark Released

avatar
Super Collaborator

We are happy to announce CDS 2.3 release 2 Powered by Apache Spark. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads.

 

This component is generally available and is supported on CDH 5.9 and higher.

 

A Hive compatibility issue in CDS 2.0 release 2 Powered By Apache Spark affects CDH 5.10.1 and higher, CDH 5.9.2 and higher, CDH 5.8.5 and higher, and CDH 5.7.6 and higher. If you are using one of these CDH versions, you must upgrade to the Spark 2.0 release 2 or higher parcel, to avoid Spark 2 job failures when using Hive functionality.


There are no new incompatible changes in this release.



What's New in CDS 2.3 release 2 Powered By Apache Spark

  • Spark lineage support, which can be used with Navigator in CM 5.14 for metadata and transformation analysis and better regulatory compliance.
  • Vectorized PySpark UDF support which improves PySpark performance
  • History Server Scalability with a more UI which can show application at start/restart much faster than before, even if there are a lot of applications
  • Parquet timestamp read side adjustment so that Spark can read timestamps written by Impala

 

Issues Fixed in CDS 2.3 release 2 Powered by Apache Spark

For a full list of fixed issues, see the list here.

Download Cloudera Distribution of CDS 2.3 release 2 Powered By Apache Spark.

Read the documentation.

Want to become a pro Spark user?  Sign up for Apache Spark Training.

 

Note: We uncovered a bug while releasing CDS 2.3 release 1 which caused us to replace it with CDS 2.3 release 2 with a fix.

0 REPLIES 0