Created 10-14-2014 01:20 PM
We're pleased to announce the release of Cloudera Enterprise 5.2 (comprising CDH 5.2, Cloudera Manager 5.2, Cloudera Director 1.0, and Cloudera Navigator 2.1).
This release reflects our continuing investments in Cloudera Enterprise's main focus areas, including security, integration with the partner ecosystem, and support for the latest innovations in the open source platform (including Impala 2.0, its most significant release yet, and Apache Hive 0.13.1). It also includes a new product, Cloudera Director, that streamlines deployment and management of enterprise-grade Hadoop clusters in cloud environments; new component releases for building real-time applications; and new support for significant partner technologies like EMC Isilon. Furthermore, this release ships the first results of joint engineering with Intel, including WITH GRANT OPTION for Hive and Impala and performance optimizations for MapReduce.
Here are some of the highlights (incomplete; see the respective Release Notes for CDH, Cloudera Manager, and Cloudera Navigator for full lists of features and fixes):
Security
Via Apache Sentry (incubating) 1.4, GRANT and REVOKE statements in Impala and Hive can now include WITH GRANT OPTION, for delegation of granting and revoking privileges (joint work with Intel under Project Rhino).
Hue has a new Sentry UI that supports policy management for visually creating/editing roles in Sentry and permissions on Files in HDFS .
Kerberos authentication is now supported in Apache Accumulo.
Impala, authentication can now be done through a combination of Kerberos and LDAP.
Data Management and Governance
Cloudera Navigator 2.1 features a brand new auditing UI that is unified with lineage and discovery, so you now have access to all Navigator functionality from a single interface.
Navigator 2.1 includes role-based access control so you can restrict access to auditing, metadata and policy management capabilities
We’re also shipping a beta policy engine in Navigator 2.1. Targeted to GA by year-end, the policy engine allows you to set up rules and notifications so you can classify data as it arrives and integrate with data preparation and profiling tools. Try it out and let us know what you think!
And we’ve added lots of top-requested enhancements, such as Sentry auditing for Impala and integration with Hue.
Cloud Deployment
Cloudera Director is a simple and reliable way to deploy, scale, and manage Hadoop in the cloud (initially for AWS) in an enterprise-grade fashion. It’s free to download and use, and supported by default for Cloudera Enterprise customers. Features include:
Simple UI for self-service cluster spin up/teardown
Dynamic scaling for spiky workloads
Simple cloning of clusters
Cloud blueprints for repeatable deployments
Third-party software deployment within same workflow
Support for custom, workload-specific deployments
Support for complex cluster topologies
Minimum size cluster when capacity constrained
Multi-cluster dashboard
Instance tracking for account billing
Real-Time Architecture
Rebase on Apache HBase 0.98.6
Cell-level ACLs for fine-grained access control of data in HBase now supported
Backported improvements to get and put request scheduling and throttling that provide basic QoS for multi-tenant HBase tables and clusters. Lets some production and real-time workloads take priority over ad hoc and analytic jobs.
Backported patches that make Offheap Block Cache (aka bucket cache) production-ready. Now you can use large amounts of memory for read caching without the GC penalties of the past. Bucket cache is now the default.
Backported authentication of clients accessing HBase via the HBase Thrift Proxy.
Rebase on Apache Spark/Streaming 1.1
Rebase on Impala 2.0
Cloudera Search
now provides Spark-indexing - iterative, fast index design
distributed pivot facets
ability to expire documents
node fail recovery
support for deep paging and for multithreaded faceting
Apache Sqoop now supports import into Apache Parquet (incubating) file format
Apache Kafka integration with CDH is now incubating in Cloudera Labs; a Kafka-Cloudera Labs parcel (unsupported) is available for installation. Integration with Flume via special Source and Sink have been provided.
Impala 2.0
Disk-based query processing: enables large queries to "spill to disk" if their in-memory structures are larger than the currently available memory. (Note that this feature only uses disk for the portion that doesn't fit in the available memory.)
Greater SQL compatibility: SQL 2003 analytic (window) functions, support for legacy data types (such as CHAR and VARCHAR), better compliance with SQL standards (WHERE, EXISTS, IN), and additional vendor-specific SQL extensions.
Impala 2.0 is now also available for CDH 4.
New Open Source Releases and Certifications
Cloudera Enterprise 5.2 includes multiple new component releases:
Apache Avro 1.7.6
Apache Crunch 0.11
Apache Hadoop 2.5
Apache HBase 0.98.6
Apache Hive 0.13.1
Apache Parquet (incubating) 1.5 / Parquet-format 2.1.0
Apache Sentry (incubating) 1.4
Apache Spark 1.1
Apache Sqoop 1.4.5
Impala 2.0
Kite SDK 0.15.0
...with new certifications on:
Filesystems: EMC Isilon
OSs: Ubuntu 14.04 (Trusty)
Java: Oracle JDK1.7.0_67
Over the next few weeks, we’ll publish blog posts that cover some of these and other new features in detail. In the meantime:
As always, we value your feedback; please provide any comments and suggestions through our community forums. You can also file bugs via issues.cloudera.org.