What's New @ Cloudera

Ana · ‎03-29-2019

We are pleased to announce the general availability of Cloudera Enterprise 6.2.0, the modern platform for machine learning and analytics optimized for the cloud. This release delivers a number of new capabilities, improved usability, and better performance.

New capabilities include:

Management highlights:

Support for a Shared Data Experience (SDX) in Cloudera Manager. Cloudera Manager now supports creating ‘compute clusters’ serving disparate workloads for independent tenants with stronger isolation & reliability, while operating on shared data, catalog, security and governance using a ‘data context’ abstraction. This permits separation of responsibilities in the administration of each tenant, and between the compute and storage tiers of deployment and works seamlessly with private cloud infrastructure & operating models.
BDR replication to clusters using cloud object storage. Cloudera BDR now supports replicating Hive & Impala tables stored in HDFS directly into clusters that use S3 and ADLS for table storage, enabling regular synchronization for hybrid cloud use cases
Support for GPU scheduling in YARN. Together, Cloudera Manager and YARN enable automatic detection, isolation and usage accounting of GPU resources shared by multiple workloads, for users who explicitly request access to these specialized resources on select nodes within a shared cluster
Automated wire encryption (TLS) setup & key rotation is now available for existing CDH clusters that were not initially created with TLS security
AWS/Azure credential handling for Hive in secure clusters, enabling transparent access to S3/ADLS data for multiple Hive users in a shared cluster while keeping cloud credentials secure and out of end users’ hands
Support for configuring a TLS secured Hive Metastore database in Cloudera Manager
Cross-cluster network bandwidth test tool. Cloudera Manager now has an API to test network bandwidth between clusters, helping determine if the infrastructure is suitable for separating storage and compute services
Automatic duplicate host detection & hostname migration. Cloudera Manager now detects and rejects duplicate hosts from joining a cluster and gracefully tolerates changes in hostnames for managed hosts, better supporting automated deployments

Search, query, access highlights:

In HUE we have significantly improved the troubleshooting experience for Impala queries so that an SQL developer can understand faster what is going on, where time is spent, and where to optimize.
Impala highlights:

A new section (/admission) was added to the Impala Web UI that provides visibility into Admission Control resource pools, running and queued queries, and other related metrics. More details here.
A new guardrail was added to automatically cancel queries when they produce more rows than the guardrail limit.
Users can now set a default file format query option which will be applied to CREATE TABLE commands that do not specify a STORED AS clause.
(Preview) Zero-Touch Metadata: Currently, if a non-Impala engine e.g. Hive or Spark adds a new partition to an existing table or a new table altogether, an Impala user needs to run a REFRESH table or an INVALIDATE metadata operation to access them via Impala. In 6.2, we have introduced an automatic mechanism that obviates the need for these operations by Impala users. Both newly added partitions to existing tables as well as newly added tables outside of Impala are automatically accessible to Impala users within a configurable time period (default 30 sec).

Hive highlights:

Compile Lock Removal: Compilation of a single large query in Hive could block compilation of all other smaller queries because of the existence of a universal compilation lock in HiveServer2 (HS2). In 6.2, this lock has been removed to enable parallel compilation of queries. The level of parallelism is configurable and by default set to 3.
Improved Configurability of Connection Pool Agents (DBCP and BoneCP): Configuration changes to connection pool agents for connecting from HiveServer2 to Hive Metastore e.g. DBCP and BoneCP required recompilation of jars. Now, in 6.2 this can be done via changes to hive-site.xml file.
Hive now supports Google Cloud Storage as table storage backend.

Security highlights:

HMS Metadata Read Authorization: Prior to 6.2, HMS API had a Sentry plugin authorizing all metadata changes (writes). Now in 6.2, Sentry’s permission are extended to reading metadata as well. By default, this is turned off for backward compatibility. With this enabled, users accessing the HMS API directly (such as SparkSQL users) now must have at least SELECT access to an object before they can view metadata related to that object. Note that Hive and Impala DESCRIBE commands also similarly filter the metadata that users see.

Governance highlights:

Navigator enhancements:

Column ordinal - now tracking the order that columns were added to a table
Metadata purge usability improvement: Purge can be set at higher priority - run at exact time. Note: Navigator UI will be unavailable, but no loss of metadata or audits
Bulk Update API: Up to 100x faster metadata updates from partner products and customer integrations

Operational databases highlights:

Serial replication. HBase replication prior to this was eventually consistent. Which meant that updates could be delivered out-of-order to replication end-points. Serial replication is a flag on replication that ensures that updates are delivered in order to replication end-points.
Support for Intel Optane memory DC persistent memory. Customers can use DC persistent memory for the BucketCache enabling creation of larger bucket caches than possible with DRAM.
Minor replication improvements (new configuration options, improvements to the verify replication tool, bug fixes)

Kudu highlights:

Kudu can now be deployed in stretch cluster configurations spanning racks, data centers or availability zones. Kudu masters will ensure that tablets are deployed spanning multiple racks/D/Cs and AZs to provide continuous availability in the case of failure. No manual failovers will be required in the case of a disaster where a rack, D/C or AZ outage.

Platform Support highlights:

Support for deploying with Ubuntu 18

Please refer to the release notes for a complete list of features. We also encourage you to review the new Upgrade Guide that now includes the ability to create a customized document based on your unique upgrade path.

Cloudera Enterprise 6.2.0 includes updated versions of many of our platform components, including rebases to the following Apache project versions: Kafka 2.1.0, HBase 2.1.2, Oozie 5.1.0, and Kudu 1.9.0

Additional information is available in the documentation.

As always, we'd love your feedback and remain committed to your success! Please provide any comments and suggestions through our community forums.

Cloudera Community

What's New @ Cloudera

[ANNOUNCE] Cloudera Enterprise 6.2.0 Released