Member since
10-14-2015
93
Posts
52
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1997 | 04-20-2016 02:37 PM |
03-29-2019
04:48 PM
4 Kudos
We are pleased to announce the general availability of Cloudera Enterprise 6.2.0, the modern platform for machine learning and analytics optimized for the cloud. This release delivers a number of new capabilities, improved usability, and better performance. New capabilities include: Management highlights: Support for a Shared Data Experience (SDX) in Cloudera Manager. Cloudera Manager now supports creating ‘compute clusters’ serving disparate workloads for independent tenants with stronger isolation & reliability, while operating on shared data, catalog, security and governance using a ‘data context’ abstraction. This permits separation of responsibilities in the administration of each tenant, and between the compute and storage tiers of deployment and works seamlessly with private cloud infrastructure & operating models. BDR replication to clusters using cloud object storage. Cloudera BDR now supports replicating Hive & Impala tables stored in HDFS directly into clusters that use S3 and ADLS for table storage, enabling regular synchronization for hybrid cloud use cases Support for GPU scheduling in YARN. Together, Cloudera Manager and YARN enable automatic detection, isolation and usage accounting of GPU resources shared by multiple workloads, for users who explicitly request access to these specialized resources on select nodes within a shared cluster Automated wire encryption (TLS) setup & key rotation is now available for existing CDH clusters that were not initially created with TLS security AWS/Azure credential handling for Hive in secure clusters, enabling transparent access to S3/ADLS data for multiple Hive users in a shared cluster while keeping cloud credentials secure and out of end users’ hands Support for configuring a TLS secured Hive Metastore database in Cloudera Manager Cross-cluster network bandwidth test tool. Cloudera Manager now has an API to test network bandwidth between clusters, helping determine if the infrastructure is suitable for separating storage and compute services Automatic duplicate host detection & hostname migration. Cloudera Manager now detects and rejects duplicate hosts from joining a cluster and gracefully tolerates changes in hostnames for managed hosts, better supporting automated deployments Search, query, access highlights: In HUE we have significantly improved the troubleshooting experience for Impala queries so that an SQL developer can understand faster what is going on, where time is spent, and where to optimize. Impala highlights: A new section (/admission) was added to the Impala Web UI that provides visibility into Admission Control resource pools, running and queued queries, and other related metrics. More details here . A new guardrail was added to automatically cancel queries when they produce more rows than the guardrail limit. Users can now set a default file format query option which will be applied to CREATE TABLE commands that do not specify a STORED AS clause. (Preview) Zero-Touch Metadata: Currently, if a non-Impala engine e.g. Hive or Spark adds a new partition to an existing table or a new table altogether, an Impala user needs to run a REFRESH table or an INVALIDATE metadata operation to access them via Impala. In 6.2, we have introduced an automatic mechanism that obviates the need for these operations by Impala users. Both newly added partitions to existing tables as well as newly added tables outside of Impala are automatically accessible to Impala users within a configurable time period (default 30 sec). Hive highlights: Compile Lock Removal: Compilation of a single large query in Hive could block compilation of all other smaller queries because of the existence of a universal compilation lock in HiveServer2 (HS2). In 6.2, this lock has been removed to enable parallel compilation of queries. The level of parallelism is configurable and by default set to 3. Improved Configurability of Connection Pool Agents (DBCP and BoneCP): Configuration changes to connection pool agents for connecting from HiveServer2 to Hive Metastore e.g. DBCP and BoneCP required recompilation of jars. Now, in 6.2 this can be done via changes to hive-site.xml file. Hive now supports Google Cloud Storage as table storage backend. Security highlights: HMS Metadata Read Authorization : Prior to 6.2, HMS API had a Sentry plugin authorizing all metadata changes (writes). Now in 6.2, Sentry’s permission are extended to reading metadata as well. By default, this is turned off for backward compatibility. With this enabled, users accessing the HMS API directly (such as SparkSQL users) now must have at least SELECT access to an object before they can view metadata related to that object. Note that Hive and Impala DESCRIBE commands also similarly filter the metadata that users see. Governance highlights: Navigator enhancements: Column ordinal - now tracking the order that columns were added to a table Metadata purge usability improvement: Purge can be set at higher priority - run at exact time. Note: Navigator UI will be unavailable, but no loss of metadata or audits Bulk Update API: Up to 100x faster metadata updates from partner products and customer integrations Operational databases highlights: Serial replication. HBase replication prior to this was eventually consistent. Which meant that updates could be delivered out-of-order to replication end-points. Serial replication is a flag on replication that ensures that updates are delivered in order to replication end-points. Support for Intel Optane memory DC persistent memory. Customers can use DC persistent memory for the BucketCache enabling creation of larger bucket caches than possible with DRAM. Minor replication improvements (new configuration options, improvements to the verify replication tool, bug fixes) Kudu highlights: Kudu can now be deployed in stretch cluster configurations spanning racks, data centers or availability zones. Kudu masters will ensure that tablets are deployed spanning multiple racks/D/Cs and AZs to provide continuous availability in the case of failure. No manual failovers will be required in the case of a disaster where a rack, D/C or AZ outage. Platform Support highlights: Support for deploying with Ubuntu 18 Please refer to the release notes for a complete list of features. We also encourage you to review the new Upgrade Guide that now includes the ability to create a customized document based on your unique upgrade path. Cloudera Enterprise 6.2.0 includes updated versions of many of our platform components, including rebases to the following Apache project versions: Kafka 2.1.0, HBase 2.1.2, Oozie 5.1.0, and Kudu 1.9.0 Additional information is available in the documentation . As always, we'd love your feedback and remain committed to your success! Please provide any comments and suggestions through our community forums.
... View more
12-19-2018
09:46 AM
1 Kudo
We are pleased to announce a new minor release of our supported packaging of Apache Accumulo for use on CDH6.0.0. This release is the latest stable Apache Accumulo and it supports C6.0.0 and later. Please refer to the installation guide for known issues and a list of unsupported upstream features. Existing users should see a new parcel available for Accumulo 1.9.2. This release is supported with Cloudera Manager and CDH versions 6.0.0 or later. Cloudera recommends using the latest version of Cloudera Manager available.
... View more
Labels:
- Labels:
-
Accumulo
12-18-2018
02:58 PM
2 Kudos
We are pleased to announce the general availability of Cloudera Enterprise 6.1.0, the modern platform for machine learning and analytics optimized for the cloud. This release delivers a number of new capabilities, improved usability, and better performance. New capabilities include: Management Cloudera Manager can test the network latency of all network links within a cluster or between clusters to identify performance bottlenecks or misconfigured networks BDR now supports backup from non-secure (non-Kerbrerized) clusters to secure (Kerberized) clusters Impala health monitoring diagnostics have been significantly improved New guardrails for Impala query memory usage are available for Impala resource pools, improving multi-tenancy for Impala Storage Support for HDFS Erasure encoding for Hive, Navigator, BDR, MapReduce and Spark workloads can reduce storage requirements by up to 50% with a negligible performance impact Support for Azure Data Lake Storage Gen 2 enables better performance and lower cost for customers deploying CDH to Azure Search, query, access Impala now supports exact multiple COUNT(DISTINCT <expr>) within a single query allowing more complex data warehouse queries to be run Ingest Support for Spark Structured Streaming - which enables micro-batch processing at as little as 100ms increments with SQL-like APIs - including DataFrames and Datasets, while simplifying implementations via abstractions. Flume now supports continuous ingest of data into Kudu (from messaging sources such as Kafka, JMS or Avro) using the Flume Kudu sink Kafka now supports JBOD enabling customers to use cheaper disk and reduce the cost of storage Sqoop now supports loading data into S3 and permits creating tables & loading data directly in a Sentry-secured Hive database with a single step Security - Finer-grained permissions (also in C5.16.1) Sentry adds CREATE permission and user-level ownership of tables. This enables secure sharing of a single sandbox database among many users and eliminates the administrative overhead of creating separate databases, roles, and groups to preserve privacy for one person or a small group. Impala REFRESH METADATA permission, allows admins to regulate who can execute this impactful Impala command. Security - Key Management Support for AWS CloudHSM enables customers deploying HDFS clusters on AWS to protect encryption keys in isolated purpose-built hardware security modules. Platform Support Support for deploying with OpenJDK 8. MapReduce now supports the Zstandard compression codec Usability enhancements include: Search, query, access Data discovery simplifications in Hue help users get to the right data faster Query queuing visualizations in Hue identify when clusters are busy, preventing user frustration and multiple resend of queries to busy clusters Ingest Flume agents now have push-button wire encryption using TLS with Cloudera Manager 6’s Auto-TLS feature Simplified access control, improved security defaults and better metrics in Kafka Governance - Navigator Autocomplete for Databases, Tables/Views, and Fields - simplify searches. Hive table details page shows more details on each column - allows Navigator users to see the details about all columns of a table on a single screen. Performance enhancements include: Improved scanning performance, and compactions rate limits in Accumulo Governance Navigator handles larger volumes of data with more select HDFS event and metadata capture. Please refer to the release notes for a complete list of features. We also encourage you to review the new Upgrade Guide that now includes the ability to create a customized document based on your unique upgrade path. In particular, Cloudera Manager 6.1.0 supports upgrading CDH 5.15.0 and CDH 5.16.1 clusters to CDH 6.1.0. Cloudera Enterprise 6.1.0 includes updated versions of many of our platform components, including rebases to the following Apache project versions: Kafka 2.0, Spark 2.4, HBase 2.1.1, Accumulo 1.9.2, and Solr 7.4 Additional information is available in the documentation . As always, we'd love your feedback and remain committed to your success! Please provide any comments and suggestions through our community forums.
... View more
11-21-2018
11:06 AM
We are pleased to announce the release of the CDK 3.1.1 Powered by Apache Kafka for CDH 5. Apache Kafka is a highly scalable, distributed, publish-subscribe messaging system. CDK 3.1.1 Powered by Apache Kafka is a maintenance release based on Apache Kafka 1.0.1. Notable Issues Fixed in CDK 3.1.1 Powered by Apache Kafka: KAFKA-3978 - Ensure high watermark is always positive KAFKA-6593 - Fix livelock with consumer heartbeat thread in commitSync KAFKA-6857 - Leader should reply with undefined offset if undefined leader epoch requested KAFKA-6917 - Process txn completion asynchronously to avoid deadlock KAFKA-6975 - Fix replica fetching from non-batch-aligned log start offset KAFKA-7012 - Don't process SSL channels without data to process KAFKA-7104 - More consistent leader's state in fetch response KAFKA-7278 - replaceSegments() should not call asyncDeleteSegment() for segments which have been removed from segments list All backported fixes can be viewed in the git release notes here or on our website under the Issues fixed section. We look forward to you trying CDK 3.1.1 Powered by Apache Kafka . For more information, please use the links below: Install or upgrade Kafka Review the documentation Review the Release Notes As always, we welcome your feedback. Please send your comments and suggestions through our community forums.
... View more
Labels:
- Labels:
-
Kafka
10-05-2018
03:27 PM
1 Kudo
We are happy to announce CDS 2.3 Release 4 Powered By Apache Spark. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. This component is generally available and is supported on CDH 5.9 and higher, and Cloudera Manager 5.11 and higher. What's New in CDS 2.3 release 4 This is purely a maintenance release. See CDS Powered By Apache Spark Fixed Issues for the list of fixed issues. Download CDS 2.3 Release 4 Powered By Apache Spark. Read the documentation . Want to become a pro Spark user? Sign up for Apache Spark Training .
... View more
10-02-2018
12:46 PM
We are happy to announce CDS 2.2 Release 4 Powered By Apache Spark. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. This component is generally available and is supported on CDH 5.8 and higher, and Cloudera Manager 5.8.3, 5.9 and higher. What's New in CDS 2.2 release 4 This is purely a maintenance release. See CDS Powered By Apache Spark Fixed Issues for the list of fixed issues. Download CDS 2.2 Release 4 Powered By Apache Spark. Read the documentation . Want to become a pro Spark user? Sign up for Apache Spark Training .
... View more
Labels:
- Labels:
-
Spark
09-17-2018
07:53 PM
We are happy to announce CDS 2.1 Release 3 Powered By Apache Spark. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. This component is generally available and is supported on CDH 5.7 and higher. What's New in CDS 2.1 release 3 This is purely a maintenance release. See CDS Powered By Apache Spark Fixed Issues for the list of fixed issues. Download CDS 2.1 Release 3 Powered By Apache Spark. Read the documentation . Want to become a pro Spark user? Sign up for Apache Spark Training .
... View more
09-04-2018
04:37 PM
We are happy to announce CDS 2.2 Release 3 Powered By Apache Spark. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. This component is generally available and is supported on CDH 5.8 and higher. What's New in CDS 2.2 release 3 This is purely a maintenance release. See CDS Powered By Apache Spark Fixed Issues for the list of fixed issues. Download CDS 2.2 Release 3 Powered By Apache Spark. Read the documentation . Want to become a pro Spark user? Sign up for Apache Spark Training .
... View more
08-30-2018
09:59 AM
6 Kudos
Cloudera is proud to announce the general availability of Cloudera Enterprise 6.0, featuring a number of enhancements that improve workload performance and build on our enterprise-grade tooling and SDX (shared data experience) capabilities.
In addition to upgrading differentiated administration and productivity tools like Cloudera Manager and Navigator, we have also updated several components of our open source core.
Cloudera Enterprise 6.0 delivers the data management foundation for your mission-critical machine learning and analytics workloads today and into the future. Please read on for more detail on component updates and benefits:
Cloudera Manager 6.0
Cloudera Manager 6 delivers a number of major new capabilities, all of which can be leveraged with both CDH6 and CDH5 environments
Fine-grained administrative access controls on individual clusters allows organizations to manage more clusters, including those supporting sensitive & confidential projects, with fewer resources, and support a wider range of users, while preventing mistakes that could cause outages.
Automated wire encryption (TLS) setup for a wide variety of CDH components and Cloudera Manager itself drastically reduces the effort to provision & configure new clusters with secure client-server and inter-node communication channels protecting applications from man-in-the-middle attacks that could lead to a data breach.
Support for managing up to 2,500 nodes with a single Cloudera Manager instance, enabling customers to manage more clusters with fewer administrators and less overhead, and supporting the deployment of very large scale data management systems.
Cloudera Navigator 6.0
We are delivering a number of enhancements that collectively improve performance. Queries can now be distinguished by cluster in a multi-cluster environment. Data stewards can now include complete descriptions of objects in Navigator rather than having a word limit.
Apache Hadoop 3.0
Hadoop 3.0 brings a number of new features to Cloudera Enterprise 6.0. Please refer to the online documentation for details on supported features.
Apache HBase 2.0
Delivers performance and stability enhancements and makes real-time operational analytics more powerful and reliable by isolating multi-tenant applications.
Apache Hive 2.1
Vectorization brings up to 80% performance improvements to analytics workloads.
Apache Kafka 1.0
Now bundled with CDH, Kafka enables customers to deploy stream processing applications at scale via new features focused on management, stability, high-availability, and security.
Apache Solr 7.0
Enhanced integrated search capabilities with nested data types and JSON facet support provides another way for enterprises to discover and understand all of their untapped unstructured data.
HUE 4.2
Our SQL workbench is enabled by default to simplify and expedite common tasks for Cloudera Data Warehouse users.
Upgrade Prerequisites To upgrade an existing cluster to Cloudera Enterprise 6, you must have the following versions:
CDH
CDH 5.7 and above
Databases
MySQL 5.7 and above
MariaDB 5.5 and above
PostgreSQL 8.4 and above
Oracle 12c and above
JDK
Oracle JDK 1.8
Operating System
RHEL 6.8, 6.9
RHEL 7.2 and above
SLES 12 SP2 and above
Ubuntu 16 and above
The new release captures several years of hard work, innovation, and collaboration between Clouderans, our customers, and the open source community at large. We’re pleased to bring it to you today.
Download Cloudera Enterprise 6.0
View the Cloudera Enterprise 6.0 documentation
... View more
07-10-2018
05:56 PM
We are happy to announce CDS 2.3 release 3 Powered by Apache Spark. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. This component is generally available and is supported on CDH 5.9 and higher, and Cloudera Manager 5.11 and higher. This is purely a maintenance release and it includes all fixes that are in the Apache Spark 2.3.1 upstream release. Test-only changes are omitted. For more information, see the Apache Spark 2.3.1 upstream release notes . SPARK-16451 - [REPL] Spark-shell / pyspark should finish gracefully when "SaslException: GSS initiate failed" is hit SPARK-17756 - [PYTHON][STREAMING] java.lang.ClassCastException returned when using 'cartesian' with DStream.transform SPARK-24029 - Set the "reuse address" flag on listen sockets SPARK-24216 - [SQL] Spark TypedAggregateExpression uses getSimpleName this is not safe in Scala SPARK-24369 - [SQL] Correct handling for multiple distinct aggregations that have the same argument set SPARK-24468 - [SQL] DecimalType 'adjustPrecisionScale' might fail when scale is negative SPARK-24495 - [SQL] SortMergeJoin with duplicate keys produces wrong results SPARK-24506 - [UI] Add UI filters to tabs added after binding SPARK-24542 - [SQL] Hive UDF series UDFXPathXXXX allows users to pass carefully crafted XML to access arbitrary files SPARK-24548 - [SQL] JavaPairRDD to Dataset<Row> in Spark generates ambiguous results SPARK-24552 - Task attempt numbers are resused when stages are retried SPARK-24578 - [CORE] Reading remote cache block behavior changes and causes timeout issue SPARK-24583 - [SQL] Wrong schema type in InsertIntoDataSourceCommand SPARK-24589 - [CORE] OutputCommitCoordinator might allow duplicate commits Download Cloudera Distribution of CDS 2.3 release 3 Powered By Apache Spark. Read the documentation . Want to become a pro Spark user? Sign up for Apache Spark Training .
... View more
Labels:
- Labels:
-
Spark
05-15-2018
02:56 PM
5 Kudos
Cloudera is proud to announce the beta availability of Cloudera Enterprise 6. The new release includes a large number of important upgrades to our open source core components as well as improvements to our unique innovations. We believe Cloudera Enterprise 6 will make your experience more productive and efficient. Please read on to learn about the new features that make Cloudera Enterprise 6 a must-have release. Customer Benefits Gain better insights from structured and unstructured data with Solr 7 integrated search. The vast majority of the data being created is unstructured and tapping that data has been cumbersome. Furthermore, fitting that data into the existing structured data paradigm has required a normalization process that is time-consuming. Solr 7 provides both a deeper level of analysis and opens up the unstructured data universe to traditional BI tools through a SQL interface. For Cloudera Enterprise 6.0 we support the new JSON Facet Module and Nested Documents, while other new query interfaces are aiming for the 6.x roadmap. Realize machine learning and analytics performance gains thanks to Hive vectorization and the addition of custom hardware profiles for intensive workloads. YARN custom hardware profiles allow for the scheduling of jobs on specialized hardware (i.e. GPUs) where performance gains can be between 5x to 10x for use cases like deep learning. Hive vectorization brings a 20%-80% performance boost. Increase the efficiency of cluster administration and protect access to sensitive data and infrastructure with fully automated wire encryption (TLS) and fine-grained, per-cluster access controls for users of Cloudera Manager, enabling administrators to provision secure, multi-cluster deployments of up to 2,500 nodes in minutes with minimal management overhead. Major Components with Significant Changes in Cloudera Enterprise 6.0 SDX Cloudera Manager 6.0 Cloudera Director 6.0 Cloudera Navigator 6.0 Cloudera Navigator Key Trustee 6.0 Apache Sentry 2.0 Apache Kafka 1.0 Analytics and Machine Learning Workloads Apache Solr 7.0 Apache Spark 2.2 Core Platform Apache Hadoop 3.0 Apache Hive 2.1 Apache HBase 2.0 Apache Oozie 5.0 Apache Avro 1.8 Apache Parquet 1.9 Upgrade Prerequisites If you are going to upgrade an existing cluster to Cloudera Enterprise 6, there are some prerequisites detailed below. CDH CDH 5.7 and above Databases MySQL 5.7 and above MariaDB 5.5 and above PostgreSQL 8.4 and above Oracle 12c and above JDK Oracle JDK 1.8 Operating Systems RHEL 6.8 and above RHEL 7.2 and above SLES 12 SP2 and above Ubuntu 16 and above IMPORTANT NOTES : Upgrades from Cloudera Enterprise 6 beta to future Cloudera Enterprise 6 GA will not be possible. Cloudera Enterprise 6 betas are not covered by Cloudera Support subscriptions. Assistance for beta users is obtained via our Cloudera community portal . You can download Cloudera Enterprise 6 - Beta here . We believe Cloudera Enterprise 6 is a major leap forward in functionality and enterprise quality and we hope you enjoy all the benefits it has to offer. Please don’t hesitate to contact us with any feedback.
... View more
04-16-2018
06:17 PM
2 Kudos
We are happy to announce CDS 2.3 release 2 Powered by Apache Spark. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads.
This component is generally available and is supported on CDH 5.9 and higher.
A Hive compatibility issue in CDS 2.0 release 2 Powered By Apache Spark affects CDH 5.10.1 and higher, CDH 5.9.2 and higher, CDH 5.8.5 and higher, and CDH 5.7.6 and higher. If you are using one of these CDH versions, you must upgrade to the Spark 2.0 release 2 or higher parcel, to avoid Spark 2 job failures when using Hive functionality.
There are no new incompatible changes in this release.
What's New in CDS 2.3 release 2 Powered By Apache Spark
Spark lineage support, which can be used with Navigator in CM 5.14 for metadata and transformation analysis and better regulatory compliance.
Vectorized PySpark UDF support which improves PySpark performance
History Server Scalability with a more UI which can show application at start/restart much faster than before, even if there are a lot of applications
Parquet timestamp read side adjustment so that Spark can read timestamps written by Impala
Issues Fixed in CDS 2.3 release 2 Powered by Apache Spark
For a full list of fixed issues, see the list here .
Download Cloudera Distribution of CDS 2.3 release 2 Powered By Apache Spark.
Read the documentation .
Want to become a pro Spark user? Sign up for Apache Spark Training .
Note: We uncovered a bug while releasing CDS 2.3 release 1 which caused us to replace it with CDS 2.3 release 2 with a fix.
... View more
Labels:
- Labels:
-
Spark
01-17-2018
05:45 AM
We are happy to announce Apache Spark 2.2 release 2. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. This component is generally available and is supported on CDH 5.8 and higher. A Hive compatibility issue in Cloudera Distribution of Apache Spark 2.0 release 1 affects CDH 5.10.1 and higher, CDH 5.9.2 and higher, CDH 5.8.5 and higher, and CDH 5.7.6 and higher. If you are using one of these CDH versions, you must upgrade to the Spark 2.0 release 2 or higher parcel, to avoid Spark 2 job failures when using Hive functionality. What's New in Cloudera Distribution of Apache Spark 2.2 Release 2 This is purely a maintenance release. See Spark 2 Fixed Issues for the list of fixed issues. Issues Fixed in Cloudera Distribution of Apache Spark 2.2 release 2 For a full list of fixed issues, see the list here . Download Cloudera Distribution of Apache Spark 2.2 release 2. Read the documentation . Want to become a pro Spark user? Sign up for Apache Spark Training .
... View more
- Tags:
- cdh5.11.1 spark2.2
Labels:
- Labels:
-
Spark
07-18-2017
03:14 PM
We are happy to announce Apache Spark 2.2 release 1. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. This component is generally available and is supported on CDH 5.8 through CDH 5.12. What's New in Cloudera Distribution of Apache Spark 2.2 Release 1 Support for CDH 5.12 and associated features. Support for using Spark 2 jobs to read and write data on the Azure Data Lake Store (ADLS) cloud service. Cloudera Distribution of Apache Spark 2.2 requires JDK 8. Issues Fixed in Cloudera Distribution of Apache Spark 2.2 release 1 [SPARK-10364][SQL] Support Parquet logical type TIMESTAMP_MILLIS [SPARK-10849][SQL] Adds option to the JDBC data source write for user to specify database column type for the create table [SPARK-12868][SQL] Allow adding jars from HDFS [SPARK-14503][ML] spark.ml API for FPGrowth [SPARK-16101][HOTFIX] Fix the build with Scala 2.10 by explicit typed argument [SPARK-16122][CORE] Add rest api for job environment For a full list of fixed issues, see the list here . Download Cloudera Distribution of Apache Spark 2.2 release 1. Read the documentation . Want to become a pro Spark user? Sign up for Apache Spark Training .
... View more
Labels:
- Labels:
-
Spark
07-13-2017
11:24 AM
Cloudera is pleased to announce that Cloudera Enterprise 5.12 is now generally available (GA). The release includes enhancements for running in cloud environments (with broader ADLS support and improved AWS Spot Instance support), usability and productivity improvements for both data science and analytic workloads, as well as performance gains and self-service performance management across a range of workloads. As usual, there are also a number of quality enhancements, bug fixes, and other improvements across the stack. Here is a partial list of what’s included (see the Release Notes for a full list): Core Platform Improved AWS Spot Instance Support: Cloudera Director 2.5 makes using AWS Spot Instance much easier and more reliable. Director can now recover from Spot instances disappearing during initial cluster spin up, grow operations, and in steady state. In addition, Cloudera Manager can be made aware of Spot instances for improved job reliability. OpenStack Reference Architecture: Cloudera Enterprise now enables customers to spin up/down clusters faster and more easily by supporting clusters running on OpenStack. The new OpenStack Reference Architecture complements the existing VMWare Reference Architecture for running Cloudera Enterprise on virtual infrastructure. Backup and Disaster Recovery Enhancements: BDR now makes it easier to diagnose and fix connectivity issues in Kerberized environments, provides more robust replication of Hive metadata, and supports longer running replication jobs by automatically renewing Kerberos tickets and Hadoop delegation tokens. Data Science & Engineering Cloudera Data Science Workbench enhancements include: GPU Support: Cloudera Data Science Workbench now enables popular deep learning frameworks to run on GPUs, both on-premises and in the cloud. Embedded Web UIs: Users can work with the Apache Spark Web UI for Spark sessions. Other interactive web applications like TensorBoard, Shiny, and Plotly now appear directly in the workbench. Enhanced Job Scheduling: Cloudera Data Science Workbench users can now schedule jobs directly from external schedulers or orchestration systems via the new Jobs API. Cloudera Altus Workload Analytics: Cloudera Altus users can now access the industry’s first suite of self-service troubleshooting and performance management tools for transient data engineering workloads, enabling end users to diagnose common execution and performance issues without needing to contact an administrator. Operational DB Kudu Function/Performance Improvements (also applies to Analytic DB) : Support has been added for timestamps, both directly through Apache Kudu and indirectly through Apache Impala. New supportability tools have been developed for correcting under-replicated tablets and for system checks. Performance enhancements to Kudu include improved bulk loading and improved behavior on denser nodes. HBase Cloud Storage support and Spark Integration : Apache HBase now has ADLS support and recommendations for Azure deployment. Outside of cloud, HBase now has support for long-lived Spark applications via token renewal. Analytic DB Usage-Enriched Query Assistance: Hue now integrates with Navigator Optimizer to provide intelligent recommendations for more efficient SQL query design. SQL developers immediately receive recommendations based on popular usage and access patterns, as well as Impala and Hive best practices for optimized query performance. Enhanced Analytic Workbench Interface: The updated Hue 4.0 provides a modernized, intuitive experience for SQL users that enables greater productivity and a seamless workflow. Added Cloud-Native Integrations and Faster SQL Analytics across Environments: Impala now supports Microsoft ADLS for cloud-native analytics and continues to see performance and efficiency gains across all storage options (Amazon S3, Microsoft ADLS, HDFS, Kudu). The full contents of this release include: Cloudera Enterprise 5.12 (comprising CDH 5.12, Cloudera Manager 5.12, and Cloudera Navigator 2.11) Cloudera Director 2.5 Apache Kudu 1.4 Cloudera Data Science Workbench 1.1 Cloudera Distribution of Kafka (CDK) 2.2 Cloudera Navigator Optimizer Updates Over the next few weeks, we’ll publish blog posts that cover some of these features in detail. In the meantime you can access the following links for additional information: Download Cloudera Enterprise 5.12 Explore documentation As always, we value your feedback; please provide any comments and suggestions through our community forums . You can also file bugs via issues.cloudera.org
... View more
04-07-2017
02:35 PM
We are happy to announce Apache Spark 2.1 release 1. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. Cloudera Distribution of Apache Spark 2.1 release 1 is compatible with the following CDH versions: CDH 5.7, CDH 5.8, CDH 5.9, CDH 5.10 . What's New in Cloudera Distribution of Apache Spark 2.1 Release 1 New direct connector to Kafka that uses the new Kafka consumer API. See Spark 2 Kafka Integration for details. Issues Fixed in Cloudera Distribution of Apache Spark 2.1 - Release 1 [SPARK-19554][UI,YARN] Allow SHS URL to be used for tracking in YARN RM. [SPARK-16554][CORE] Automatically Kill Executors and Nodes when they are Blacklisted [SPARK-16654][CORE] Add UI coverage for Application Level Blacklisting [SPARK-8425][CORE] Application Level Blacklisting [SPARK-18117][CORE] Add test for TaskSetBlacklist [SPARK-18949][SQL][BACKPORT-2.1] Add recoverPartitions API to Catalog [SPARK-19459][SQL][BRANCH-2.1] Support for nested char/varchar fields in ORC [SPARK-19611][SQL] Introduce configurable table schema inference Download Cloudera Distribution of Apache Spark 2.1 release 1. Read the documentation . Want to become a pro Spark user? Sign up for Apache Spark Training .
... View more
03-27-2017
12:31 PM
We are pleased to announce the release of Impala JDBC v 2.5.37 and Impala ODBC v2.5.37 drivers . This release has the following fixes and enhancements: Cloudera JDBC Driver for Impala 2.5.37 Enhancements & New Features Specify asynchronous exec poll interval. You can now specify the time in milliseconds between each poll that the driver makes for the query execution status. Specify the number of milliseconds in the Advanced Options dialog box in the Async Exec Poll Interval field, or in the AsyncExecPollInterval configuration option. Support for Impala 2.8. The driver now supports Impala versions 1.0.1 through 2.8. Resolved Issues Update/Delete statements require a semicolon at the end. Conflicting information in documentation regarding CDH. Cloudera ODBC Driver for Impala 2.5.37 Enhancements & New Features Specify asynchronous exec poll interval. You can now specify the time in milliseconds between each poll that the driver makes for the query execution status. Specify the number of milliseconds in the Advanced Options dialog box in the Async Exec Poll Interval field, or in the AsyncExecPollInterval configuration option. Support for Impala 2.8. The driver now supports Impala versions 1.0.1 through 2.8. Specify Kerberos hostname canonicalization. By default, if you specify a Kerberos realm, the Kerberos layer canonicalizes the host FQDN in the server’s service principal name. You can disable this behavior by disabling the Canonicalize Principal FQDN option, or by setting the ServicePrincipalCanonicalization connection property to 0. Configure SSL certificate revocation check. You can now configure the driver to check whether a TLS/SSL certificate stored in the Windows Trust Store has been revoked. By default, the driver checks for revocation. To disable the revocation check, clear the Check Certificate Revocation check box, or set the CheckCertRevocation key to 0. Simplified MIT Kerberos configuration. When using MIT Kerberos to access the Impala service from a Kerberos realm that is different than the Kerberos realm that the user belongs to, the user is no longer required to add the Impala service's network domain to the Kerberos realm mapping in the Kerberos configuration file on the client machine. Upgraded OpenSSL library. The driver now uses OpenSSL 1.0.2. Previously, the driver used OpenSSL 1.0.1l. Resolved Issues Conflicting information in documentation regarding CDH. Incorrect driver version verification instructions for macOS in documentation. Segmentation fault in Driver Manager detection on Linux and Solaris Sparc platforms. Queries using DISTINCT run correct in HUE but not via ODBC. Driver converts COALESCE function to less efficient Impala CASE statement. When attempting to use the Windows trust store on Windows Server 2016, an access violation exception occurs. Getting Started with the Cloudera Drivers Read the Cloudera JDBC 2.5.37 Driver for Impala Release Notes and Installation Guide . Read the Cloudera ODBC 2.5.37 Driver for Impala Release Notes and Installation Guide . Download the connector from the Cloudera Connectors page. As always, we welcome your feedback. Please send your comments and suggestions to the user group or through our community forums. You can also file bugs through our external JIRA projects on issues.cloudera.org .
... View more
02-24-2017
07:20 PM
We are happy to announce Spark 2.0 release 2. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. Release 2 addresses a Hive compatibility issue that affects CDH 5.10.1 and higher, CDH 5.9.2 and higher, CDH 5.8.5 and higher, and CDH 5.7.6 and higher. If you are using one of these CDH versions, you must upgrade to the Spark 2.0 release 2 parcel to avoid Spark 2 job failures when using Hive. Release 2 is based on Apache Spark 2.0.2. Issues Fixed in Cloudera Distribution of Apache Spark 2.0 Release 2 [ SPARK-4563 ] [CORE] Allow driver to advertise a different network address [ SPARK-18993 ] Unable to build/compile Spark in IntelliJ due to missing Scala deps in spark-tags [ SPARK-19314 ] Do not allow sort before aggregation in Structured Streaming plan [ SPARK-18762 ] Web UI should be http:4040 instead of https:4040 [ SPARK-18745 ] java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB) [ SPARK-18703 ] Insertion/CTAS against Hive Tables: Staging Directories and Data Files Not Dropped Until Normal Termination of JVM [ SPARK-18091 ] Deep if expressions cause Generated SpecificUnsafeProjection code to exceed JVM code size limit Download Cloudera Distribution of Apache Spark 2.0 release 2. Read the documentation . Want to become a pro Spark user? Sign up for Apache Spark Training .
... View more
Labels:
- Labels:
-
Spark
01-31-2017
09:24 AM
1 Kudo
Cloudera is proud to announce that Cloudera Enterprise 5.10 is now generally available (GA).The highlights of this release include the GA of the new columnar storage engine Apache Kudu, improved cloud performance and cost-optimizations, and cloud-native data governance for Amazon S3. As usual, there are also a number of quality enhancements and bug fixes (learn more about our multi-dimensional hardening/QA process ) and other improvements across the stack. Here is a partial list of what’s included (see the Release Notes for a full list): GA of Apache Kudu - Unleash Cloudera’s new storage engine to enable fast analytics on fast changing data with the first generally available release of Kudu. Kudu is purpose-built to enable use cases for time series data, machine data analytics, and online reporting—as part of a complete analytic or operational database. Improved Cloud Performance - Run Cloudera workloads on public cloud infrastructure more efficiently and cost-effectively than ever before. Specific enhancements in this release include: Deploying new clusters in cloud environments faster using Cloudera Director Running transient batch processing jobs up to 2x faster compared to previous releases. Reducing AWS instance costs by leveraging Amazon Spot Block instances. Big Data Governance for the Hybrid Cloud - Cloudera Navigator now provides cataloging, metadata management, and comprehensive lineage for data in Amazon S3, making it the only big data management and governance solution for data stored on-premise and in the cloud. This release also includes policy-based business metadata assignment and validation, major performance optimizations, and a refreshed look-and-feel for increased data stewardship productivity. Expanded Recommendations for Active Data Optimization - Cloudera Navigator Optimizer now provides expanded recommendations and risk alerts, making it even easier for architects and DBAs to understand, migrate, and manage workloads on Hadoop. Added Efficiencies and Design Assistance for SQL Developers - Increase SQL developer productivity with the latest version of Hue, which provides improved exploration and table sampling (including over Amazon S3), better support for viewing and interacting with Parquet files, and faster loading of documents. Continued Security & Compliance Improvements - Increase overall platform and application security and compliance by taking advantage of new cloud access key management controls, Kafka authorization via Sentry, and new data encryption techniques. The full contents of this release include: Cloudera Enterprise 5.10 (comprising CDH 5.10, Cloudera Manager 5.10, and Cloudera Navigator 2.9) Cloudera Director 2.3 Cloudera Navigator Optimizer Updates Kafka 2.1 Kudu 1.2 Over the next few weeks, we’ll publish blog posts that cover some of these features in detail. In the meantime: Download Cloudera Enterprise 5.10 Explore documentation As always, we value your feedback; please provide any comments and suggestions through our community forums . You can also file bugs via issues.cloudera.org .
... View more
12-12-2016
04:27 PM
1 Kudo
Apache Spark is a core component of the Cloudera Enterprise platform. It is the de facto processing engine for Hadoop and the modern analytics engine for an increasing number of workloads. Organizations leverage Apache Spark to reduce churn, implement predictive maintenance, and perform complex risk modeling and analysis. IT professionals leverage Spark to accelerate data processing, train large-scale machine learning models, and perform exploratory data science. Taneja reports that for the most critical Spark workloads, 57% of users choose to partner with Cloudera because of the quality of support and breadth of training and services. The Apache Spark ecosystem continues to grow at a fast pace, and Cloudera delivers the newest, most desired features with reliability and performance at scale. We are happy to announce support for Apache Spark version 2.0. CDH users can download the parcel and apply it directly to provisioned clusters. You can leverage Spark 2.0 without disrupting your currently running Spark workloads. Spark 2.0 capabilities include the following: Combined API - A unified API for batch and streaming jobs. Machine learning persistence - The ability to save and load ML models via MLlib persistence. Structured streaming - The first streaming API running on top of SparkSQL. Improved Performance. Download Cloudera Distribution of Apache Spark 2.0 Release 1 Read the documentation and our blog Want to become a pro Spark user? Sign up for Apache Spark Training .
... View more
12-08-2016
09:52 AM
We are pleased to announce the release of the Cloudera JDBC Driver for Apache Hive v2.5.18. This release includes the following fixes and enhancements: When using Kerberos authentication, you can now configure the KrbAuthType connection property to specify how the driver obtains the Kerberos Subject. For more information, see the Installation and Configuration Guide. In some cases, when the heap size is restricted, the driver retrieves fewer rows than expected but does not return any exceptions or errors. This issue has been resolved. The driver now retrieves the correct number of rows when the heap size is restricted. See Release Notes for details on all of the fixes and enhancements. Getting started with the Cloudera Drivers: Read the Cloudera JDBC v2.5.18 Driver for Hive Release Notes and Installation Guide . Download the connector from the Cloudera Connectors page. As always, we welcome your feedback. Please send your comments and suggestions to the user group or through our community forums. You can also file bugs through our external JIRA projects on issues.cloudera.org .
... View more
12-01-2016
08:52 AM
We are pleased to announce the release of Hive ODBC v2.5.21, Impala ODBC v2.5.36, and Impala JDBC v2.5.35. Resolved issues in Impala JDBC 2.5.35 The driver does not properly detect a TOP statement and inserts a TOP 0 before it to generates the limit 0 query. This happens if there is an additional space in the query before the original TOP. This issue has been resolved. Queries with YEAR and ISO_WEEK functions failing due to a data type mismatch. This issue has been resolved. Please see below links for details on all of the fixes and enhancements Release Notes Installation Guide Enhancements and new features in Impala ODBC 2.5.36 Support added for Kerberos delegation You now have the option for the driver to forward the user credentials to the server when using Kerberos authentication. Support for Windows Trust Store You now have the option to use the CA certificates in the Windows Trust Store for server verification when using SSL. Auto Reconnect You can configure the driver to automatically attempt reconnection to the Impala server if communications are lost. Support for query, login, and connection timeout The driver now supports the following ODBC API calls: • SQL_ATTR_CONNECTION_TIMEOUT • SQL_ATTR_LOGIN_TIMEOUT • SQL_ATTR_QUERY_TIMEOUT [14244] Optimization of metadata retrieval Resolved issues in Impala ODBC v2.5.36 If a user attempted to connect without proper permissions to access the schema set in the connection string, the driver would fail to connect. The driver will now display a warning under these circumstances, and use the default schema. Specifying an alias for a column name in the SELECT list when SELECT DISTINCT is present and using that column name in the ORDER BY clause causes a system message to be displayed. Driver returning incorrect order for character data types in the SQLGetTypeInfo result set under some circumstances. Please see below links for details on all of the fixes and enhancements Release Notes installation Guide Enhancements and new features in Hive ODBC 2.5.21 Support for the Windows trust store You now have the option to use the CA certificates in the Windows trust store for server verification when using SSL. Auto Reconnect You can now configure the driver to automatically attempt reconnection to the Hive server if communications are lost. Resolved issues in Hive ODBC 2.5.21 Driver fails to connect to the server when using TLS 1.2. When executing a query that contains a CASE statement, the driver replaces the greater than sign (>) with an equal sign (=). Driver does not correctly translate queries that contain the IS NOT NULL operator. Please see below links for details on all of the fixes and enhancements Release Notes Installation Guide Getting started with the Cloudera Drivers: Download the connector from the Cloudera Connectors page. As always, we welcome your feedback. Please send your comments and suggestions to the user group or through our community forums. You can also file bugs through our external JIRA projects on issues.cloudera.org .
... View more
11-29-2016
10:40 AM
3 Kudos
Cloudera is excited to announce the general availability of Cloudera Enterprise 5.9! Cloudera Enterprise 5.9 contains a long list of new features, quality enhancements, bug fixes, and other improvements across the stack. Here is a partial list of those improvements; see the Release Notes for a full list: What's New In CDH 5.9.0 Apache Hadoop You can use temporary credentials to log in to Amazon S3 and obtain temporary credentials from Amazon's Security Token Service (STS). Apache HBase A tool has been added to dump existing replication peers, configurations, and queues when using HBase replication. For more information, see Class DumpReplicationQueues . Metrics have been added that expose the amount of replayed work occurring in the HBase replication system. For more information, see Replication Metrics in the Apache HBase Reference Guide . Apache Hive HIVE-14270 : Added parameters to optimize write performance for Hive tables and partitions that are stored on Amazon S3. See Optimizing Hive Write Performance on Amazon S3 . Hue HUE-2915 : Integrates Hue with Amazon S3 . You can now access both S3 and HDFS in the File Browser, create tables from files in S3, and save query results in S3. See how to Enable S3 Cloud Storage . HUE-4039 : Improves SQL Autocompleter. The new Autocompleter understands Hive and Impala SQL dialects and provides smart suggestions based on statement structure and cursor position. See how to manually Enable and Disable Autocompleter . HUE-3877 : Adds support for Amazon RDS. You can now deploy Hue against an Amazon RDS database instance with MySQL, PostgreSQL, and Oracle engines. Rebase of Hue on upstream Hue 3.11 . Apache Impala (incubating) Performance improvements: [ IMPALA-3206 ] Speedup for queries against DECIMAL columns in Avro tables. [ IMPALA-3674 ] Improved efficiency in LLVM code generation can reduce codegen time, especially for short queries. [ IMPALA-2979 ] Improvements to scheduling on worker nodes, enabled by the REPLICA_PREFERENCEquery option. See REPLICA_PREFERENCE Query Option (CDH 5.9 or higher only) for details. [ IMPALA-1683 ] The REFRESH statement can be applied to a single partition, rather than the entire table. See REFRESH Statement and Refreshing a Single Partition for details. Improvements to the Impala web user interface: [ IMPALA-2767 ] You can now force a session to expire by clicking a link in the web UI, on the /sessions tab. [ IMPALA-3715 ] The /memz tab includes more information about Impala memory usage. [ IMPALA-3716 ] The Details page for a query now includes a Memory tab. [ IMPALA-3499 ] Scalability improvements to the catalog server. [ IMPALA-3677 ] You can send a SIGUSR1 signal to any Impala-related daemon to write a Breakpad minidump. See Breakpad Minidumps for Impala (CDH 5.8 or higher only) for details about the Breakpad minidump feature. [ IMPALA-3687 ] The schema reconciliation rules for Avro tables have changed slightly for CHAR and VARCHAR columns. See Creating Avro Tables for details about column definitions in Avro tables. [ IMPALA-3575 ] Some network operations now have additional timeout and retry settings. Apache Sentry Sentry adds support for securing data on Amazon RDS. As a result, Sentry can secure URIs with an RDS schema. SENTRY-1233 - Logging improvements for SentryConfigToolSolr. SENTRY-1119 - Allow data engines to obtain the ActionFactory directly from the configuration, instead of having hardcoded component-specific classes. SENTRY-1229 - Added a basic configurable cache to SentryGenericProviderBackend. Apache Spark You can now set up AWS credentials for Spark with the Hadoop credential provider, to avoid exposing the AWS secret key in configuration files. Apache Sqoop The mainframe import module extension has been added to support data sets on tape. Cloudera Search The Solr watchdog is now configured to use the fully qualified domain name (FQDN) of the host on which the Solr process is running (instead of 127.0.0.1). You can override this configuration by setting the SOLR_HOSTNAME environment variable to appropriate value. Cloudera Search adds support for index snapshots. For more information, see Backing Up and Restoring Cloudera Search . What's New in Cloudera Manager 5.9.0 You can c reate virtual images of Cloudera Manager and cluster hosts. See Creating Virtual Images of Cluster Hosts . Security External/Cloud account configuration in Cloudera Manager. Account configuration for access to Amazon Web Services is now available through the centralized UI menu External Accounts. Key Trustee Server rolling restart. Key Trustee Server now supports rolling restart. Backup and Disaster Recovery You can now replicate HDFS files and Hive data to and from an Amazon S3 instance. See HDFS Replication to Amazon S3 and Hive Replication To and From Amazon S3 . You can now download performance data about HDFS replication jobs from the Replication Schedules and Replication History pages. See Monitoring the Performance of HDFS Replications . Hive replication now stores Hive UDFs in the Hive metastore. Replication of Impala and Hive User Defined Functions (UDFs) . YARN jobs now include the BDR schedule ID that launched the job so you can connect logs with existing schedules, if multiple schedules exist. Resource Management You can create custom Cluster Utilization reports that you can export data from. See Creating a Custom Cluster Utilization Report . When Cloudera Manager manages multiple clusters, Historical Applications and User and Historical Queries by User show applications per user and per pool. Directory usage reports can be exported as a CSV file. Cloudera Manager API Added the update_user() method to the Python API client api_client.py. New API endpoints have been added that allow users to add, list and remove Watched Directories in HDFS service. Logging Kafka log4j log files now include the hostname in the format kafka-broker-${host}.log. Similarly, MirrorMaker logs now include the hostname in the format kafka-mirrormaker-${host}.log. Cloudera Manager displays the History and Rollback support for the Cloudera Manager Settings. This helps Cloudera Support provide better service when certain Cloudera Manager administrative settings are modified. Diagnostic Bundles You can specify information to be redacted in the diagnostic bundle in the UI using Administration > Settings > Redaction Parameters for Diagnostic Bundles. Upgrade Informs you when a simple restart is performed instead of rolling restart on a service because rolling restart is not available. Oozie The Actions menu in the Oozie service has two new commands, Dump Database and Load Database, which make it easier to migrate an Oozie database to another database supported by Oozie. Install Oozie ShareLib command assigns correct permissions to the uploaded libraries. This prevents breaking Oozie workflows with a custom umask setting. Configuration Changes Added the zkClientTimeout parameter for ZooKeeper. Added a new option for setting the file format used by an ApplicationMaster when generating the .jhist file. Adds graceful decommission on YARN NodeManager roles. The NodeManager is not assigned new containers, and it waits for any currently running applications to finish before being decommissioned, unless a timeout occurs. Configure the timeout using the Node Manager Graceful Decommission Timeout configuration property in the YARN Service. stdout and stderr log links are now shown in the UI when a failure occurs while deploying client configurations. Added the configuration parameter, Extra Space Ratio for Indexing, to Reports Manager. Use the parameter to increase indexing speed by allocating additional memory. The default amount of time that HBase Indexer roles attempt to connect to ZooKeeper has been increased from 30 to 60 seconds. Cloudera Manager can identify whether or not a customer is using the embedded PostgreSQL database. If Cloudera Manager is configured to use the embedded PostgreSQL database, a yellow banner appears in the UI, recommending that you upgrade to a supported external database. When Impala uses SSL, TLS Connection to Catalog Server is now supported. You can enable replication for any Impala UDFs/Metadata (in Hive Replication). When running wizards from the Cloudera Manager Admin Console that add a cluster, add a service, perform an upgrade, and other tasks, steps do not display when they are not reachable or do not apply to the current configuration. Improve Cloudera Manager provisioning performance on AWS. Add support for resetting Cloudera Manager GUID/UUID by checking the UUID file. Over the next few weeks, we will publish blog posts that cover some of these features in detail. In the meantime: Download Cloudera Enterprise 5.9 Explore documentation As always, we value your feedback; please provide any comments and suggestions through our community forums. You can also file bugs through issues.cloudera.org.
... View more
11-09-2016
09:00 PM
We are pleased to announce the general availability of the Cloudera Connector for Netezza 1.4c5. New Features in Cloudera Connector for Netezza 1.4c5 Added the escapeColName method to handle, for example, column names with lowercase characters. For more details on new features and usage of Cloudera Connector for Netezza, see: Release Notes Cloudera Connector for Netezza Version 1.4c5 Cloudera Connector for Netezza User Guide , version 1.4c5 As always, we welcome your feedback. Please send your comments and suggestions through our new community forums. You can also file bugs in the CDH project at issues.cloudera.org.
... View more
- Tags:
- connectors
- netezza
10-13-2016
02:12 PM
We are pleased to announce the general availability of the Cloudera Connector Powered by Teradata 1.6c5. New Features in Cloudera Connector Powered by Teradata Version 1.6c5 Upgrades the JDBC driver to version 15.10.00.22 and the TDCH library to version 1.5.0. These libraries contain several bug fixes and improvements. Adds the --schema argument, used to override the <td-instance> value in the connection string of the Sqoop command. For example, if the connection string in the Sqoop command is jdbc:teradata:// <td-host>/DATABASE=database1, but you specify --schema database2, your data is imported from database2 and not database1. If the connection string does not contain the DATABASE parameter — for example jdbc:teradata://<td-host>/CHARSET=UTF8) — you can also use the --schema databasename argument to have Sqoop behave as if you specified the jdbc:teradata://<td-host>/DATABASE=databasename,CHARSET=UTF8 connection string. For more details on new features and usage of Cloudera Connector Powered by Teradata, see Download Cloudera Connector Powered by Teradata version 1.6c5 Release Notes Cloudera Connector Powered by Teradata version 1.6c5 Cloudera Connector Powered by Teradata User Guide, version 1.6c5 As always, we welcome your feedback. Please send your comments and suggestions through our community forums. You can also file bugs in issues.cloudera.org.
... View more
10-11-2016
07:14 AM
5 Kudos
Cloudera is pleased to announce a new minor release of our supported packaging of Apache Accumulo for use on CDH 5. This release adds support for: Kerberos authentication for Accumulo Clients RHEL 7 / CentOS 7 support Running Accumulo on top of Apache HDFS Transparent Encryption AccumuloStorgeHandler for Apache Hive access to data stored in Accumulo Apache Spark processing of data stored in Accumulo through AccumuloInputFormat and AccumuloOutputFormat Users are cautioned to read the installation guide for known issues and unsupported upstream features. Existing users should see a new parcel available for Accumulo 1.7.2-cdh5.5.0. This release is supported with Cloudera Manager and CDH versions 5.5.0 or later. Cloudera recommends using the latest version of Cloudera Manager available. Docs: http://tiny.cloudera.com/accumulo-docs Cloudera Manager 5.8.2 Download: http://tiny.cloudera.com/cm-5.8.2 Accumulo 1.7.2 on CDH 5 (CDH 5.5.0 or later): http://archive.cloudera.com/accumulo-c5/ List of JIRAs included in addition to Apache Accumulo 1.7.2 release: http://tiny.cloudera.com/accumulo-1.7.2-cdh5.5.0-release-notes
... View more
10-04-2016
04:26 PM
We are pleased to announce the release of Cloudera Enterprise 5.7.4 (CDH 5.7.4, Cloudera Manager 5.7.4, Cloudera Navigator 2.6.4, and Key Trustee KMS 5.7.4). This release fixes key bugs and includes the following: CDH fixes for the following issues: HADOOP-8436 - NPE In getLocalPathForWrite ( path, conf ) when the required context item is not configured HDFS-8269 - getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime HDFS-9781 - FsDatasetImpl#getBlockReports can occasionally throw NullPointerException MAPREDUCE-6628 - Potential memory leak in CryptoOutputStream HBASE-16284 - Unauthorized client can shutdown the cluster HIVE-14436 - Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error" HIVE-14743 - ArrayIndexOutOfBoundsException - HBASE-backed views query with JOINs IMPALA-3682 - Do not retry unrecoverable socket creation errors IMPALA-4020 - Handle external conflicting changes to HMS gracefully SPARK-8428 - Fix integer overflows in TimSort For a full list of upstream JIRAs fixed in CDH 5.7.4, see the issues fixed section of the Release Notes . Cloudera Manager fixes for the following issues: Agent orphan cleanup removes process dir from in flight process Host inspector incorrectly warns about kernel version "2.6.32-504.16.2" If total_space_bytes is really big, heartbeats fail Oozie points to older sharelib even after running sharelib install command OOMKiller script does not work for Impala Catalog For a full list of issues fixed in Cloudera Manager 5.7.4, see the i ssues fixed section of the Release Notes. Issues Fixed in Key Trustee KMS KMS does not failover to Passive Key Trustee Server in some network failure scenarios: In some situations, if the Active Key Trustee Server is unreachable on the network, Key Trustee KMS does not fail over to the Passive Key Trustee Server. For a full list of issues fixed in Key Trustee KMS 5.7.4, see the issues fixed section of the Release Notes . We look forward to you trying it, using the information below: Download Cloudera Enterprise View the documentation As always, we are happy to hear your feedback. Please send your comments and suggestions to the user group or through our community forums . You can also file bugs through our external JIRA projects on issues.cloudera.org .
... View more
09-30-2016
04:33 PM
We are pleased to announce the release of Impala JDBC v 2.5.34 driver . This release has the following fixes and enhancements: Enhancements and new features: Authentication method and transport protocol. You can now specify if SASL should be used in conjunction with the 'Username and Password' Authentication Mechanism (AuthMech=3) by using the Use Sasl connection property. Previously this was done through the transport Mode connection property, which has now been removed. See Release Notes for JDBC for details on all of the fixes. Getting Started with the Cloudera Drivers Read the Cloudera JDBC 2.5.34 Driver for Impala Release Notes and Installation Guide . Download the connector from the Cloudera Connectors page. As always, we welcome your feedback. Please send your comments and suggestions to the user group or through our community forums. You can also file bugs through our external JIRA projects on issues.cloudera.org .
... View more
Labels:
- Labels:
-
Impala
09-20-2016
11:51 AM
Cloudera is happy to announce the availability of parcels and packages for Kudu 1.0.0. After approximately a year of beta releases, Apache Kudu has reached version 1.0.0. This version number signifies that the development team feels that Kudu is stable enough for usage in production environments.This is the first non-beta release of the Apache Kudu project. (Although because Kudu is not currently integrated into CDH, it is not yet an officially supported CDH component.) Kudu 1.0.0 delivers a number of new features, bug fixes, and optimizations. We are also releasing a refresh of the Impala Kudu parcel. To upgrade Kudu to 1.0.0, see Upgrade Parcels or Upgrade Packages . In addition, you can read more about Kudu 1.0.0 in the vision blog post we just published. New Features in Kudu 1.0.0 Removal of multiversion concurrency control (MVCC) history is now supported. This is known as tablet history GC. This allows Kudu to reclaim disk space, where previously Kudu would keep a full history of all changes made to a given table since the beginning of time. Previously, the only way to reclaim disk space was to drop a table. Kudu will still keep historical data, and the amount of history retained is controlled by setting the configuration flag --tablet_history_max_age_sec, which defaults to 15 minutes (expressed in seconds). The timestamp represented by the current time minus tablet_history_max_age_sec is known as the ancient history mark (AHM). When a compaction or flush occurs, Kudu will remove the history of changes made prior to the ancient history mark. This only affects historical data; currently-visible data will not be removed. A specialized maintenance manager background task to remove existing "cold"historical data that is not in a row affected by the normal compaction process will be added in a future release. Most of Kudu’s command line tools have been consolidated under a new top-level kudu tool. This reduces the number of large binaries distributed with Kudu and also includes much-improved help output. The Kudu Flume Sink now supports processing events containing Avro-encoded records, using the newAvroKuduOperationsProducer. Administrative tools including kudu cluster ksck now support running against multi-master Kudu clusters. The output of the ksck tool is now colorized and much easier to read. The C++ client API now supports writing data in AUTO_FLUSH_BACKGROUND mode. This can provide higher throughput for ingest workloads. Performance The performance of comparison predicates on dictionary-encoded columns has been substantially optimized. Users are encouraged to use dictionary encoding on any string or binary columns with low cardinality, especially if these columns will be filtered with predicates. The Java client is now able to prune partitions from scanners based on the provided predicates. For example, an equality predicate on a hash-partitioned column will now only access those tablets that could possibly contain matching data. This is expected to improve performance for the Spark integration as well as applications using the Java client API. The performance of compaction selection in the tablet server has been substantially improved. This can increase the efficiency of the background maintenance threads and improve overall throughput of heavy write workloads. The policy by which the tablet server retains write-ahead log (WAL) files has been improved so that it takes into account other replicas of the tablet. This should help mitigate the spurious eviction of tablet replicas on machines that temporarily lag behind the other replicas. Wire protocol compatibility Kudu 1.0.0 maintains client-server wire-compatibility with previous releases. Applications using the Kudu client libraries may be upgraded either before, at the same time, or after the Kudu servers. Kudu 1.0.0 does not maintain server-server wire compatibility with previous releases. Therefore, rolling upgrades between earlier versions of Kudu and Kudu 1.0.0 are not supported. Incompatible Changes in Kudu 1.0.0 Command line tools The kudu-pbc-dump tool has been removed. The same functionality is now implemented as kudu pbc dump. The kudu-ksck tool has been removed. The same functionality is now implemented as kudu cluster ksck. The cfile-dump tool has been removed. The same functionality is now implemented as kudu fs cfile dump. The log-dump tool has been removed. The same functionality is now implemented as kudu wal dumpand kudu local_replica dump wals. The kudu-admin tool has been removed. The same functionality is now implemented within kudu tableand kudu tablet. The kudu-fs_dump tool has been removed. The same functionality is now implemented as kudu fs dump. The kudu-ts-cli tool has been removed. The same functionality is now implemented within kudu master, kudu remote_replica, and kudu tserver. The kudu-fs_list tool has been removed and some similar useful functionality has been moved underkudu local_replica. Configuration flags Some configuration flags are now marked as "unsafe" and "experimental". Such flags are disallowed by default. Users may access these flags by enabling the additional flags --unlock_unsafe_flags and --unlock_experimental_flags. Usage of such flags is not recommended, as the flags may be removed or modified with no deprecation period and without notice in future Kudu releases. Client APIs (C++/Java/Python) The TIMESTAMP column type has been renamed to UNIXTIME_MICROS in order to reduce confusion between Kudu’s timestamp support and the timestamps supported by other systems such as Apache Hive and Apache Impala (incubating). Existing tables will automatically be updated to use the new name for the type. Clients upgrading to the new client libraries must move to the new name for the type. Clients using old client libraries will continue to operate using the old type name, even when connected to clusters that have been upgraded. Similarly, if clients are upgraded before servers, existing timestamp columns will be available using the new type name. KuduSession methods in the C++ library are no longer advertised as thread-safe to have one set of semantics for both C++ and Java Kudu client libraries. The KuduScanToken::TabletServers method in the C++ library has been removed. The same information can now be found in the KuduScanToken::tablet method. Apache Flume Integration The KuduEventProducer interface used to process Flume events into Kudu operations for the Kudu Flume Sink has changed, and has been renamed KuduOperationsProducer. The existing KuduEventProducers have been updated for the new interface, and have been renamed similarly. Known Issues and Limitations of Kudu 1.0.0 Schema and Usage Limitations Kudu is primarily designed for analytic use cases. You are likely to encounter issues if a single row contains multiple kilobytes of data. The columns which make up the primary key must be listed first in the schema. Key columns cannot be altered. You must drop and recreate a table to change its keys. Key columns must not be null. Columns with DOUBLE, FLOAT, or BOOL types are not allowed as part of a primary key definition. Type and nullability of existing columns cannot be changed by altering the table. A table's primary key cannot be changed. Dropping a column does not immediately reclaim space. Compaction must run first. There is no way to run compaction manually, but dropping the table will reclaim the space immediately. Partitioning Limitations Tables must be manually pre-split into tablets using simple or compound primary keys. Automatic splitting is not yet possible. Range partitions may be added or dropped after a table has been created. See Schema Design for more information. Data in existing tables cannot currently be automatically repartitioned. As a workaround, create a new table with the new partitioning and insert the contents of the old table. Replication and Backup Limitations Kudu does not currently include any built-in features for backup and restore. Users are encouraged to use tools such as Spark or Impala to export or import tables as necessary. Impala Limitations To use Kudu with Impala, you must install a special release of Impala called Impala_Kudu. Obtaining and installing a compatible Impala release is detailed in Installing and Using Apache Impala (incubating) With Apache Kudu . To use Impala_Kudu alongside an existing Impala instance, you must install using parcels. Updates, inserts, and deletes via Impala are non-transactional. If a query fails part of the way through, its partial effects will not be rolled back. All queries will be distributed across all Impala hosts which host a replica of the target table(s), even if a predicate on a primary key could correctly restrict the query to a single tablet. This limits the maximum concurrency of short queries made via Impala. No TIMESTAMP and DECIMAL type support. (The underlying Kudu type formerly known as TIMESTAMP has been renamed to UNIXTIME_MICROS; currently, there is no Impala-compatible TIMESTAMP type.) The maximum parallelism of a single query is limited to the number of tablets in a table. For good analytic performance, aim for 10 or more tablets per host or use large tables. Impala is only able to push down predicates involving =, <=, >=, or BETWEEN comparisons between any column and a literal value, and < and > for integer columns only. For example, for a table with an integer key ts, and a string key name, the predicate WHERE ts >= 12345 will convert into an efficient range scan, whereas where name > 'lipcon' will currently fetch all data from the table and evaluate the predicate within Impala. Security Limitations Authentication and authorization features are not implemented. Data encryption is not built in. Kudu has been reported to run correctly on systems using local block device encryption (e.g. dmcrypt). Client and API Limitations ALTER TABLE is not yet fully supported via the client APIs. More ALTER TABLE operations will become available in future releases. Other Known Issues The following are known bugs and issues with the current release of Kudu. They will be addressed in later releases. Note that this list is not exhaustive, and is meant to communicate only the most important known issues. If the Kudu master is configured with the -log_fsync_all option, tablet servers and clients will experience frequent timeouts, and the cluster may become unusable. If a tablet server has a very large number of tablets, it may take several minutes to start up. It is recommended to limit the number of tablets per server to 100 or fewer. Consider this limitation when pre-splitting your tables. If you notice slow start-up times, you can monitor the number of tablets per server in the web UI. Due to a known bug in Linux kernels prior to 3.8, running Kudu on ext4 mount points may cause a subsequent fsck to fail with errors such as Logical start <N> does not match logical start <M> at next level . These errors are repairable using fsck -y, but may impact server restart time. This affects RHEL/CentOS 6.8 and below. A fix is planned for RHEL/CentOS 6.9. RHEL 7.0 and higher are not affected. Ubuntu 14.04 and later are not affected. SLES 12 and later are not affected. Issues Fixed in Kudu 1.0.0 See Issues resolved for Kudu 1.0.0 and Git changes between 0.10.0 and 1.0.0 . For a complete list of new features, changes, bug fixes, and known issues, see the Kudu 1.0.0 Release Notes . As always, your feedback is appreciated. For general Kudu questions, visit the community page . If you have any questions related to the Kudu packages provided by Cloudera, including installation or configuration using Cloudera Manager , visit the Cloudera Community Forum .
... View more
- Tags:
- kudu
09-07-2016
04:18 PM
1 Kudo
We are pleased to announce the release of CDH 5.7.3 This release fixes key bugs and includes the following: CDH fixes for the following issues: FLUME-2821 - KafkaSourceUtil Can Log Passwords at Info remove logging of security related data in older releases. HADOOP-11361 - Fix a race condition in MetricsSourceAdapter.updateJmxCache. HDFS-8581 - ContentSummary on / skips further counts on yielding lock. MAPREDUCE-6675 - TestJobImpl.testUnusableNode failed. YARN-5048 - DelegationTokenRenewer#skipTokenRenewal may throw NPE. HBASE-14818 - user_permission does not list namespace permissions. HIVE-11980 - Follow up on HIVE-11696, exception is thrown from CTAS from the table with table-level serde is Parquet while partition-level serde is JSON. HIVE-13749 - Memory leak in Hive Metastore. HUE-4477 - [security] Select All is not filtering out the non visible roles from the selection . IMPALA-3441 - , IMPALA-3659: check for malformed Avro data. SENTRY-1252 - grantServerPrivilege and revokeServerPrivilege should treat "*" and "ALL" as synonyms when action is not explicitly specified. For a full list of upstream JIRAs fixed in CDH 5.7.3, see the issues fixed section of the Release Notes . We look forward to you trying it, using the information below: Download CDH View the documentation As always, we are happy to hear your feedback. Please send your comments and suggestions to the user group or through our community forums . You can also file bugs through our external JIRA projects on issues.cloudera.org .
... View more