Member since
10-14-2015
93
Posts
52
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3188 | 04-20-2016 02:37 PM |
03-29-2019
04:48 PM
4 Kudos
We are pleased to announce the general availability of Cloudera Enterprise 6.2.0, the modern platform for machine learning and analytics optimized for the cloud. This release delivers a number of new capabilities, improved usability, and better performance. New capabilities include: Management highlights: Support for a Shared Data Experience (SDX) in Cloudera Manager. Cloudera Manager now supports creating ‘compute clusters’ serving disparate workloads for independent tenants with stronger isolation & reliability, while operating on shared data, catalog, security and governance using a ‘data context’ abstraction. This permits separation of responsibilities in the administration of each tenant, and between the compute and storage tiers of deployment and works seamlessly with private cloud infrastructure & operating models. BDR replication to clusters using cloud object storage. Cloudera BDR now supports replicating Hive & Impala tables stored in HDFS directly into clusters that use S3 and ADLS for table storage, enabling regular synchronization for hybrid cloud use cases Support for GPU scheduling in YARN. Together, Cloudera Manager and YARN enable automatic detection, isolation and usage accounting of GPU resources shared by multiple workloads, for users who explicitly request access to these specialized resources on select nodes within a shared cluster Automated wire encryption (TLS) setup & key rotation is now available for existing CDH clusters that were not initially created with TLS security AWS/Azure credential handling for Hive in secure clusters, enabling transparent access to S3/ADLS data for multiple Hive users in a shared cluster while keeping cloud credentials secure and out of end users’ hands Support for configuring a TLS secured Hive Metastore database in Cloudera Manager Cross-cluster network bandwidth test tool. Cloudera Manager now has an API to test network bandwidth between clusters, helping determine if the infrastructure is suitable for separating storage and compute services Automatic duplicate host detection & hostname migration. Cloudera Manager now detects and rejects duplicate hosts from joining a cluster and gracefully tolerates changes in hostnames for managed hosts, better supporting automated deployments Search, query, access highlights: In HUE we have significantly improved the troubleshooting experience for Impala queries so that an SQL developer can understand faster what is going on, where time is spent, and where to optimize. Impala highlights: A new section (/admission) was added to the Impala Web UI that provides visibility into Admission Control resource pools, running and queued queries, and other related metrics. More details here. A new guardrail was added to automatically cancel queries when they produce more rows than the guardrail limit. Users can now set a default file format query option which will be applied to CREATE TABLE commands that do not specify a STORED AS clause. (Preview) Zero-Touch Metadata: Currently, if a non-Impala engine e.g. Hive or Spark adds a new partition to an existing table or a new table altogether, an Impala user needs to run a REFRESH table or an INVALIDATE metadata operation to access them via Impala. In 6.2, we have introduced an automatic mechanism that obviates the need for these operations by Impala users. Both newly added partitions to existing tables as well as newly added tables outside of Impala are automatically accessible to Impala users within a configurable time period (default 30 sec). Hive highlights: Compile Lock Removal: Compilation of a single large query in Hive could block compilation of all other smaller queries because of the existence of a universal compilation lock in HiveServer2 (HS2). In 6.2, this lock has been removed to enable parallel compilation of queries. The level of parallelism is configurable and by default set to 3. Improved Configurability of Connection Pool Agents (DBCP and BoneCP): Configuration changes to connection pool agents for connecting from HiveServer2 to Hive Metastore e.g. DBCP and BoneCP required recompilation of jars. Now, in 6.2 this can be done via changes to hive-site.xml file. Hive now supports Google Cloud Storage as table storage backend. Security highlights: HMS Metadata Read Authorization: Prior to 6.2, HMS API had a Sentry plugin authorizing all metadata changes (writes). Now in 6.2, Sentry’s permission are extended to reading metadata as well. By default, this is turned off for backward compatibility. With this enabled, users accessing the HMS API directly (such as SparkSQL users) now must have at least SELECT access to an object before they can view metadata related to that object. Note that Hive and Impala DESCRIBE commands also similarly filter the metadata that users see. Governance highlights: Navigator enhancements: Column ordinal - now tracking the order that columns were added to a table Metadata purge usability improvement: Purge can be set at higher priority - run at exact time. Note: Navigator UI will be unavailable, but no loss of metadata or audits Bulk Update API: Up to 100x faster metadata updates from partner products and customer integrations Operational databases highlights: Serial replication. HBase replication prior to this was eventually consistent. Which meant that updates could be delivered out-of-order to replication end-points. Serial replication is a flag on replication that ensures that updates are delivered in order to replication end-points. Support for Intel Optane memory DC persistent memory. Customers can use DC persistent memory for the BucketCache enabling creation of larger bucket caches than possible with DRAM. Minor replication improvements (new configuration options, improvements to the verify replication tool, bug fixes) Kudu highlights: Kudu can now be deployed in stretch cluster configurations spanning racks, data centers or availability zones. Kudu masters will ensure that tablets are deployed spanning multiple racks/D/Cs and AZs to provide continuous availability in the case of failure. No manual failovers will be required in the case of a disaster where a rack, D/C or AZ outage. Platform Support highlights: Support for deploying with Ubuntu 18 Please refer to the release notes for a complete list of features. We also encourage you to review the new Upgrade Guide that now includes the ability to create a customized document based on your unique upgrade path. Cloudera Enterprise 6.2.0 includes updated versions of many of our platform components, including rebases to the following Apache project versions: Kafka 2.1.0, HBase 2.1.2, Oozie 5.1.0, and Kudu 1.9.0 Additional information is available in the documentation. As always, we'd love your feedback and remain committed to your success! Please provide any comments and suggestions through our community forums.
... View more
12-19-2018
09:46 AM
1 Kudo
We are pleased to announce a new minor release of our supported packaging of Apache Accumulo for use on CDH6.0.0. This release is the latest stable Apache Accumulo and it supports C6.0.0 and later. Please refer to the installation guide for known issues and a list of unsupported upstream features. Existing users should see a new parcel available for Accumulo 1.9.2. This release is supported with Cloudera Manager and CDH versions 6.0.0 or later. Cloudera recommends using the latest version of Cloudera Manager available.
... View more
Labels:
- Labels:
-
Accumulo
12-18-2018
02:58 PM
2 Kudos
We are pleased to announce the general availability of Cloudera Enterprise 6.1.0, the modern platform for machine learning and analytics optimized for the cloud. This release delivers a number of new capabilities, improved usability, and better performance. New capabilities include: Management Cloudera Manager can test the network latency of all network links within a cluster or between clusters to identify performance bottlenecks or misconfigured networks BDR now supports backup from non-secure (non-Kerbrerized) clusters to secure (Kerberized) clusters Impala health monitoring diagnostics have been significantly improved New guardrails for Impala query memory usage are available for Impala resource pools, improving multi-tenancy for Impala Storage Support for HDFS Erasure encoding for Hive, Navigator, BDR, MapReduce and Spark workloads can reduce storage requirements by up to 50% with a negligible performance impact Support for Azure Data Lake Storage Gen 2 enables better performance and lower cost for customers deploying CDH to Azure Search, query, access Impala now supports exact multiple COUNT(DISTINCT <expr>) within a single query allowing more complex data warehouse queries to be run Ingest Support for Spark Structured Streaming - which enables micro-batch processing at as little as 100ms increments with SQL-like APIs - including DataFrames and Datasets, while simplifying implementations via abstractions. Flume now supports continuous ingest of data into Kudu (from messaging sources such as Kafka, JMS or Avro) using the Flume Kudu sink Kafka now supports JBOD enabling customers to use cheaper disk and reduce the cost of storage Sqoop now supports loading data into S3 and permits creating tables & loading data directly in a Sentry-secured Hive database with a single step Security - Finer-grained permissions (also in C5.16.1) Sentry adds CREATE permission and user-level ownership of tables. This enables secure sharing of a single sandbox database among many users and eliminates the administrative overhead of creating separate databases, roles, and groups to preserve privacy for one person or a small group. Impala REFRESH METADATA permission, allows admins to regulate who can execute this impactful Impala command. Security - Key Management Support for AWS CloudHSM enables customers deploying HDFS clusters on AWS to protect encryption keys in isolated purpose-built hardware security modules. Platform Support Support for deploying with OpenJDK 8. MapReduce now supports the Zstandard compression codec Usability enhancements include: Search, query, access Data discovery simplifications in Hue help users get to the right data faster Query queuing visualizations in Hue identify when clusters are busy, preventing user frustration and multiple resend of queries to busy clusters Ingest Flume agents now have push-button wire encryption using TLS with Cloudera Manager 6’s Auto-TLS feature Simplified access control, improved security defaults and better metrics in Kafka Governance - Navigator Autocomplete for Databases, Tables/Views, and Fields - simplify searches. Hive table details page shows more details on each column - allows Navigator users to see the details about all columns of a table on a single screen. Performance enhancements include: Improved scanning performance, and compactions rate limits in Accumulo Governance Navigator handles larger volumes of data with more select HDFS event and metadata capture. Please refer to the release notes for a complete list of features. We also encourage you to review the new Upgrade Guide that now includes the ability to create a customized document based on your unique upgrade path. In particular, Cloudera Manager 6.1.0 supports upgrading CDH 5.15.0 and CDH 5.16.1 clusters to CDH 6.1.0. Cloudera Enterprise 6.1.0 includes updated versions of many of our platform components, including rebases to the following Apache project versions: Kafka 2.0, Spark 2.4, HBase 2.1.1, Accumulo 1.9.2, and Solr 7.4 Additional information is available in the documentation. As always, we'd love your feedback and remain committed to your success! Please provide any comments and suggestions through our community forums.
... View more
11-21-2018
11:06 AM
We are pleased to announce the release of the CDK 3.1.1 Powered by Apache Kafka for CDH 5. Apache Kafka is a highly scalable, distributed, publish-subscribe messaging system. CDK 3.1.1 Powered by Apache Kafka is a maintenance release based on Apache Kafka 1.0.1. Notable Issues Fixed in CDK 3.1.1 Powered by Apache Kafka: KAFKA-3978 - Ensure high watermark is always positive KAFKA-6593 - Fix livelock with consumer heartbeat thread in commitSync KAFKA-6857 - Leader should reply with undefined offset if undefined leader epoch requested KAFKA-6917 - Process txn completion asynchronously to avoid deadlock KAFKA-6975 - Fix replica fetching from non-batch-aligned log start offset KAFKA-7012 - Don't process SSL channels without data to process KAFKA-7104 - More consistent leader's state in fetch response KAFKA-7278 - replaceSegments() should not call asyncDeleteSegment() for segments which have been removed from segments list All backported fixes can be viewed in the git release notes here or on our website under the Issues fixed section. We look forward to you trying CDK 3.1.1 Powered by Apache Kafka. For more information, please use the links below: Install or upgrade Kafka Review the documentation Review the Release Notes As always, we welcome your feedback. Please send your comments and suggestions through our community forums.
... View more
Labels:
- Labels:
-
Kafka
10-05-2018
03:27 PM
1 Kudo
We are happy to announce CDS 2.3 Release 4 Powered By Apache Spark. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. This component is generally available and is supported on CDH 5.9 and higher, and Cloudera Manager 5.11 and higher. What's New in CDS 2.3 release 4 This is purely a maintenance release. See CDS Powered By Apache Spark Fixed Issues for the list of fixed issues. Download CDS 2.3 Release 4 Powered By Apache Spark. Read the documentation. Want to become a pro Spark user? Sign up for Apache Spark Training.
... View more
10-02-2018
12:46 PM
We are happy to announce CDS 2.2 Release 4 Powered By Apache Spark. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. This component is generally available and is supported on CDH 5.8 and higher, and Cloudera Manager 5.8.3, 5.9 and higher. What's New in CDS 2.2 release 4 This is purely a maintenance release. See CDS Powered By Apache Spark Fixed Issues for the list of fixed issues. Download CDS 2.2 Release 4 Powered By Apache Spark. Read the documentation. Want to become a pro Spark user? Sign up for Apache Spark Training.
... View more
Labels:
- Labels:
-
Spark
09-17-2018
07:53 PM
We are happy to announce CDS 2.1 Release 3 Powered By Apache Spark. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. This component is generally available and is supported on CDH 5.7 and higher. What's New in CDS 2.1 release 3 This is purely a maintenance release. See CDS Powered By Apache Spark Fixed Issues for the list of fixed issues. Download CDS 2.1 Release 3 Powered By Apache Spark. Read the documentation. Want to become a pro Spark user? Sign up for Apache Spark Training.
... View more
09-04-2018
04:37 PM
We are happy to announce CDS 2.2 Release 3 Powered By Apache Spark. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. This component is generally available and is supported on CDH 5.8 and higher. What's New in CDS 2.2 release 3 This is purely a maintenance release. See CDS Powered By Apache Spark Fixed Issues for the list of fixed issues. Download CDS 2.2 Release 3 Powered By Apache Spark. Read the documentation. Want to become a pro Spark user? Sign up for Apache Spark Training.
... View more
08-30-2018
09:59 AM
6 Kudos
Cloudera is proud to announce the general availability of Cloudera Enterprise 6.0, featuring a number of enhancements that improve workload performance and build on our enterprise-grade tooling and SDX (shared data experience) capabilities.
In addition to upgrading differentiated administration and productivity tools like Cloudera Manager and Navigator, we have also updated several components of our open source core.
Cloudera Enterprise 6.0 delivers the data management foundation for your mission-critical machine learning and analytics workloads today and into the future. Please read on for more detail on component updates and benefits:
Cloudera Manager 6.0
Cloudera Manager 6 delivers a number of major new capabilities, all of which can be leveraged with both CDH6 and CDH5 environments
Fine-grained administrative access controls on individual clusters allows organizations to manage more clusters, including those supporting sensitive & confidential projects, with fewer resources, and support a wider range of users, while preventing mistakes that could cause outages.
Automated wire encryption (TLS) setup for a wide variety of CDH components and Cloudera Manager itself drastically reduces the effort to provision & configure new clusters with secure client-server and inter-node communication channels protecting applications from man-in-the-middle attacks that could lead to a data breach.
Support for managing up to 2,500 nodes with a single Cloudera Manager instance, enabling customers to manage more clusters with fewer administrators and less overhead, and supporting the deployment of very large scale data management systems.
Cloudera Navigator 6.0
We are delivering a number of enhancements that collectively improve performance. Queries can now be distinguished by cluster in a multi-cluster environment. Data stewards can now include complete descriptions of objects in Navigator rather than having a word limit.
Apache Hadoop 3.0
Hadoop 3.0 brings a number of new features to Cloudera Enterprise 6.0. Please refer to the online documentation for details on supported features.
Apache HBase 2.0
Delivers performance and stability enhancements and makes real-time operational analytics more powerful and reliable by isolating multi-tenant applications.
Apache Hive 2.1
Vectorization brings up to 80% performance improvements to analytics workloads.
Apache Kafka 1.0
Now bundled with CDH, Kafka enables customers to deploy stream processing applications at scale via new features focused on management, stability, high-availability, and security.
Apache Solr 7.0
Enhanced integrated search capabilities with nested data types and JSON facet support provides another way for enterprises to discover and understand all of their untapped unstructured data.
HUE 4.2
Our SQL workbench is enabled by default to simplify and expedite common tasks for Cloudera Data Warehouse users.
Upgrade Prerequisites To upgrade an existing cluster to Cloudera Enterprise 6, you must have the following versions:
CDH
CDH 5.7 and above
Databases
MySQL 5.7 and above
MariaDB 5.5 and above
PostgreSQL 8.4 and above
Oracle 12c and above
JDK
Oracle JDK 1.8
Operating System
RHEL 6.8, 6.9
RHEL 7.2 and above
SLES 12 SP2 and above
Ubuntu 16 and above
The new release captures several years of hard work, innovation, and collaboration between Clouderans, our customers, and the open source community at large. We’re pleased to bring it to you today.
Download Cloudera Enterprise 6.0
View the Cloudera Enterprise 6.0 documentation
... View more
07-10-2018
05:56 PM
We are happy to announce CDS 2.3 release 3 Powered by Apache Spark. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. This component is generally available and is supported on CDH 5.9 and higher, and Cloudera Manager 5.11 and higher. This is purely a maintenance release and it includes all fixes that are in the Apache Spark 2.3.1 upstream release. Test-only changes are omitted. For more information, see the Apache Spark 2.3.1 upstream release notes. SPARK-16451 - [REPL] Spark-shell / pyspark should finish gracefully when "SaslException: GSS initiate failed" is hit SPARK-17756 - [PYTHON][STREAMING] java.lang.ClassCastException returned when using 'cartesian' with DStream.transform SPARK-24029 - Set the "reuse address" flag on listen sockets SPARK-24216 - [SQL] Spark TypedAggregateExpression uses getSimpleName this is not safe in Scala SPARK-24369 - [SQL] Correct handling for multiple distinct aggregations that have the same argument set SPARK-24468 - [SQL] DecimalType 'adjustPrecisionScale' might fail when scale is negative SPARK-24495 - [SQL] SortMergeJoin with duplicate keys produces wrong results SPARK-24506 - [UI] Add UI filters to tabs added after binding SPARK-24542 - [SQL] Hive UDF series UDFXPathXXXX allows users to pass carefully crafted XML to access arbitrary files SPARK-24548 - [SQL] JavaPairRDD to Dataset<Row> in Spark generates ambiguous results SPARK-24552 - Task attempt numbers are resused when stages are retried SPARK-24578 - [CORE] Reading remote cache block behavior changes and causes timeout issue SPARK-24583 - [SQL] Wrong schema type in InsertIntoDataSourceCommand SPARK-24589 - [CORE] OutputCommitCoordinator might allow duplicate commits Download Cloudera Distribution of CDS 2.3 release 3 Powered By Apache Spark. Read the documentation. Want to become a pro Spark user? Sign up for Apache Spark Training.
... View more
Labels:
- Labels:
-
Spark