Cloudera is pleased to announce that Cloudera Enterprise 5.11 is now generally available (GA). The highlights of this release include lineage support for Apache Spark, Apache Kudu security integration, embedded data discovery for self-service BI, and new cloud capabilities for Microsoft ADLS and Amazon S3.
As usual, there are also a number of quality enhancements, bug fixes, and other improvements across the stack. Here is a partial list of what’s included (see the Release Notes for a full list):
- Core Platform and Cloud
- Amazon S3 Consistency: S3Guard ensures that operations on Amazon S3 are immediately visible to other clients, making it easier to migrate workloads from consistent file systems like HDFS to Amazon S3.
- Support for Azure Data Lake Store (ADLS): Microsoft introduced ADLS to provide a cost-effective, persistent storage layer for big data applications. With C5.11, Hive, Spark, and MapReduce can all directly operate on data stored in ADLS, enabling separation of compute and storage and transient clusters on the Azure cloud.
- S3 At-Rest encryption with AWS KMS: This feature enables at-rest, server-side encryption for data stored in S3 with encryption keys managed by Amazon’s Key Management Service (KMS). With this integration, Cloudera engines can leverage the management and control capabilities of AWS KMS to improve S3 data encryption.
- Director-support for long-lived clusters: Director 2.4 adds advanced synchronization features for long-lived clusters managed by Cloudera Manager. Users can now upgrade clusters, add services, and assign roles in Cloudera Manager while keeping a healthy connection to Cloudera Director, making it easy for add or remove nodes at any time. This combination is especially powerful for cloud-based Analytic Database use cases.
- Data Science & Engineering
- Spark Lineage: Cloudera Navigator lineage support now extends to Apache Spark. With automatic collection and visualization of lineage, users can quickly identify the provenance, usage, and impact of any dataset for regulatory compliance and end user discovery.
- Continued Performance Optimizations for Hive-on-S3: Cloud-native batch workloads are up to 5x faster (compared to 5.10) for greater cost-savings in the cloud.
- Operational DB
- Kudu Authentication/Security (also applies to Analytic DB😞 The integration of Apache Kudu with Kerberos enables the authentication of users and servers. Kudu also now features wire encryption with transport layer security and coarse-grained service level authorization.
- Improved HBase scalability and efficiency: HBase now leverages better medium object store (MOB) compaction policies to improve scalability. Additionally, the default splitting policy has been updated to minimize the number of required region servers.
- Analytic DB
- Embedded Data Discovery for Self-Service BI: Hue now leverages metadata from Cloudera Navigator to make it faster and easier for SQL developers to discover and interact with relevant sets of tables, including the ability to search across tags and add tags directly from the UI.
- Navigator Optimizer Anonymization for Secure Workload Analysis: Cloudera Navigator Optimizer now supports workload anonymization to protect sensitive information in SQL workloads during analysis and comply with data protection regulations.
The full contents of this release include:
- Cloudera Enterprise 5.11 (comprising CDH 5.11, Cloudera Manager 5.11, and Cloudera Navigator 2.10)
- Cloudera Director 2.4
- Apache Kudu 1.3
- Cloudera Distribution of Kafka (CDK) 2.1
- Cloudera Navigator Optimizer Updates
Over the next few weeks, we’ll publish blog posts that cover some of these features in detail. In the meantime you can access the following links for additional information:
As always, we value your feedback; please provide any comments and suggestions through our community forums. You can also file bugs via issues.cloudera.org.