Cloudera is delighted to announce availability of the next iteration of our modern platform for machine learning and analytics, optimized for cloud. This release continues to demonstrate innovation and a commitment to enterprise-grade quality.
Included in the Cloudera Enterprise 5.15 release is support for new versions of many of our platform components:
- CDH 5.15.0
- Cloudera Manager 5.15.0
- Director 2.8
- Navigator 2.14
- Cloudera Data Science Workbench (CDSW) 1.4
- Apache Kafka CDK 3.1.0, based on the upstream version 1.0.1
- Navigator Encrypt 3.15.0
- Key Trustee Server 5.15.0
Of special note, 5.15 adds new capabilities aligned to our machine learning, analytics, and cloud focus.
Machine Learning:
- Easily track and move models from research to production, helping to launch and compare versioned experiments. Also making it easy to deploy and manage versioned models as micro-services (REST APIs)
Analytics:
- Apache Kudu now supports the decimal column type with fixed scale and precision suitable for financial and other arithmetic.
- Kudu also has a new replica management scheme that allows for much faster recovery of tablets in scenarios where one tablet server goes down and then returns back shortly. The new scheme also provides substantially better overall stability on clusters with frequent server failures.
- Apache Impala has a new RPC functionality. This will make clusters more stable and is the foundation work to run on larger clusters.
- New Impala stats sampling and extrapolation will allow users to collect table stats using fewer resources and less time by using a sample of the data.
Cloud:
- Altus encryption at rest and in motion which covers AWS S3 data and logs, AWS EBS data and root volumes, TLS for web traffic and Impala, and Kerberos for RPC (data movement)
- Simplified cluster provisioning in Cloudera Director.
- BDR replication to Microsoft ADLS for HDFS and Hive, plus more secure cloud credential handling for both ADLS and AWS S3.
- And more!
Apache Spark 2.3 is also now available separately from CDH 5.15.0 and includes:
- Spark lineage support, which can be used with Navigator in CM 5.15 for metadata and transformation analysis and better regulatory compliance.
- Vectorized PySpark UDF support which improves PySpark performance
- History Server Scalability with a UI which can show applications at start/restart much faster than before, even if there are a lot of applications
- Apache Parquet timestamp read side adjustment, so that Spark can read timestamps written by Impala
Additional information is available in the documentation and the Release Notes.
As always, we'd love your feedback and remain committed to your success! Please provide any comments and suggestions through our community forums.