We are happy to announce CDS 2.3 release 2 Powered by Apache Spark. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads.
This component is generally available and is supported on CDH 5.9 and higher.
A Hive compatibility issue in CDS 2.0 release 2 Powered By Apache Spark affects CDH 5.10.1 and higher, CDH 5.9.2 and higher, CDH 5.8.5 and higher, and CDH 5.7.6 and higher. If you are using one of these CDH versions, you must upgrade to the Spark 2.0 release 2 or higher parcel, to avoid Spark 2 job failures when using Hive functionality.
There are no new incompatible changes in this release.
What's New in CDS 2.3 release 2 Powered By Apache Spark
Spark lineage support, which can be used with Navigator in CM 5.14 for metadata and transformation analysis and better regulatory compliance.
Vectorized PySpark UDF support which improves PySpark performance
History Server Scalability with a more UI which can show application at start/restart much faster than before, even if there are a lot of applications
Parquet timestamp read side adjustment so that Spark can read timestamps written by Impala
Issues Fixed in CDS 2.3 release 2 Powered by Apache Spark
For a full list of fixed issues, see the list here.
Download Cloudera Distribution of CDS 2.3 release 2 Powered By Apache Spark.