Member since
10-14-2015
93
Posts
52
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3267 | 04-20-2016 02:37 PM |
05-15-2018
02:56 PM
5 Kudos
Cloudera is proud to announce the beta availability of Cloudera Enterprise 6. The new release includes a large number of important upgrades to our open source core components as well as improvements to our unique innovations. We believe Cloudera Enterprise 6 will make your experience more productive and efficient. Please read on to learn about the new features that make Cloudera Enterprise 6 a must-have release. Customer Benefits Gain better insights from structured and unstructured data with Solr 7 integrated search. The vast majority of the data being created is unstructured and tapping that data has been cumbersome. Furthermore, fitting that data into the existing structured data paradigm has required a normalization process that is time-consuming. Solr 7 provides both a deeper level of analysis and opens up the unstructured data universe to traditional BI tools through a SQL interface. For Cloudera Enterprise 6.0 we support the new JSON Facet Module and Nested Documents, while other new query interfaces are aiming for the 6.x roadmap. Realize machine learning and analytics performance gains thanks to Hive vectorization and the addition of custom hardware profiles for intensive workloads. YARN custom hardware profiles allow for the scheduling of jobs on specialized hardware (i.e. GPUs) where performance gains can be between 5x to 10x for use cases like deep learning. Hive vectorization brings a 20%-80% performance boost. Increase the efficiency of cluster administration and protect access to sensitive data and infrastructure with fully automated wire encryption (TLS) and fine-grained, per-cluster access controls for users of Cloudera Manager, enabling administrators to provision secure, multi-cluster deployments of up to 2,500 nodes in minutes with minimal management overhead. Major Components with Significant Changes in Cloudera Enterprise 6.0 SDX Cloudera Manager 6.0 Cloudera Director 6.0 Cloudera Navigator 6.0 Cloudera Navigator Key Trustee 6.0 Apache Sentry 2.0 Apache Kafka 1.0 Analytics and Machine Learning Workloads Apache Solr 7.0 Apache Spark 2.2 Core Platform Apache Hadoop 3.0 Apache Hive 2.1 Apache HBase 2.0 Apache Oozie 5.0 Apache Avro 1.8 Apache Parquet 1.9 Upgrade Prerequisites If you are going to upgrade an existing cluster to Cloudera Enterprise 6, there are some prerequisites detailed below. CDH CDH 5.7 and above Databases MySQL 5.7 and above MariaDB 5.5 and above PostgreSQL 8.4 and above Oracle 12c and above JDK Oracle JDK 1.8 Operating Systems RHEL 6.8 and above RHEL 7.2 and above SLES 12 SP2 and above Ubuntu 16 and above IMPORTANT NOTES: Upgrades from Cloudera Enterprise 6 beta to future Cloudera Enterprise 6 GA will not be possible. Cloudera Enterprise 6 betas are not covered by Cloudera Support subscriptions. Assistance for beta users is obtained via our Cloudera community portal. You can download Cloudera Enterprise 6 - Beta here. We believe Cloudera Enterprise 6 is a major leap forward in functionality and enterprise quality and we hope you enjoy all the benefits it has to offer. Please don’t hesitate to contact us with any feedback.
... View more
04-16-2018
06:17 PM
2 Kudos
We are happy to announce CDS 2.3 release 2 Powered by Apache Spark. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads.
This component is generally available and is supported on CDH 5.9 and higher.
A Hive compatibility issue in CDS 2.0 release 2 Powered By Apache Spark affects CDH 5.10.1 and higher, CDH 5.9.2 and higher, CDH 5.8.5 and higher, and CDH 5.7.6 and higher. If you are using one of these CDH versions, you must upgrade to the Spark 2.0 release 2 or higher parcel, to avoid Spark 2 job failures when using Hive functionality.
There are no new incompatible changes in this release.
What's New in CDS 2.3 release 2 Powered By Apache Spark
Spark lineage support, which can be used with Navigator in CM 5.14 for metadata and transformation analysis and better regulatory compliance.
Vectorized PySpark UDF support which improves PySpark performance
History Server Scalability with a more UI which can show application at start/restart much faster than before, even if there are a lot of applications
Parquet timestamp read side adjustment so that Spark can read timestamps written by Impala
Issues Fixed in CDS 2.3 release 2 Powered by Apache Spark
For a full list of fixed issues, see the list here.
Download Cloudera Distribution of CDS 2.3 release 2 Powered By Apache Spark.
Read the documentation.
Want to become a pro Spark user? Sign up for Apache Spark Training.
Note: We uncovered a bug while releasing CDS 2.3 release 1 which caused us to replace it with CDS 2.3 release 2 with a fix.
... View more
Labels:
- Labels:
-
Spark
01-17-2018
05:45 AM
We are happy to announce Apache Spark 2.2 release 2. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. This component is generally available and is supported on CDH 5.8 and higher. A Hive compatibility issue in Cloudera Distribution of Apache Spark 2.0 release 1 affects CDH 5.10.1 and higher, CDH 5.9.2 and higher, CDH 5.8.5 and higher, and CDH 5.7.6 and higher. If you are using one of these CDH versions, you must upgrade to the Spark 2.0 release 2 or higher parcel, to avoid Spark 2 job failures when using Hive functionality. What's New in Cloudera Distribution of Apache Spark 2.2 Release 2 This is purely a maintenance release. See Spark 2 Fixed Issues for the list of fixed issues. Issues Fixed in Cloudera Distribution of Apache Spark 2.2 release 2 For a full list of fixed issues, see the list here. Download Cloudera Distribution of Apache Spark 2.2 release 2. Read the documentation. Want to become a pro Spark user? Sign up for Apache Spark Training.
... View more
Labels:
- Labels:
-
Spark
07-18-2017
03:14 PM
We are happy to announce Apache Spark 2.2 release 1. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. This component is generally available and is supported on CDH 5.8 through CDH 5.12. What's New in Cloudera Distribution of Apache Spark 2.2 Release 1 Support for CDH 5.12 and associated features. Support for using Spark 2 jobs to read and write data on the Azure Data Lake Store (ADLS) cloud service. Cloudera Distribution of Apache Spark 2.2 requires JDK 8. Issues Fixed in Cloudera Distribution of Apache Spark 2.2 release 1 [SPARK-10364][SQL] Support Parquet logical type TIMESTAMP_MILLIS [SPARK-10849][SQL] Adds option to the JDBC data source write for user to specify database column type for the create table [SPARK-12868][SQL] Allow adding jars from HDFS [SPARK-14503][ML] spark.ml API for FPGrowth [SPARK-16101][HOTFIX] Fix the build with Scala 2.10 by explicit typed argument [SPARK-16122][CORE] Add rest api for job environment For a full list of fixed issues, see the list here. Download Cloudera Distribution of Apache Spark 2.2 release 1. Read the documentation. Want to become a pro Spark user? Sign up for Apache Spark Training.
... View more
Labels:
- Labels:
-
Spark
07-13-2017
11:24 AM
Cloudera is pleased to announce that Cloudera Enterprise 5.12 is now generally available (GA). The release includes enhancements for running in cloud environments (with broader ADLS support and improved AWS Spot Instance support), usability and productivity improvements for both data science and analytic workloads, as well as performance gains and self-service performance management across a range of workloads. As usual, there are also a number of quality enhancements, bug fixes, and other improvements across the stack. Here is a partial list of what’s included (see the Release Notes for a full list): Core Platform Improved AWS Spot Instance Support: Cloudera Director 2.5 makes using AWS Spot Instance much easier and more reliable. Director can now recover from Spot instances disappearing during initial cluster spin up, grow operations, and in steady state. In addition, Cloudera Manager can be made aware of Spot instances for improved job reliability. OpenStack Reference Architecture: Cloudera Enterprise now enables customers to spin up/down clusters faster and more easily by supporting clusters running on OpenStack. The new OpenStack Reference Architecture complements the existing VMWare Reference Architecture for running Cloudera Enterprise on virtual infrastructure. Backup and Disaster Recovery Enhancements: BDR now makes it easier to diagnose and fix connectivity issues in Kerberized environments, provides more robust replication of Hive metadata, and supports longer running replication jobs by automatically renewing Kerberos tickets and Hadoop delegation tokens. Data Science & Engineering Cloudera Data Science Workbench enhancements include: GPU Support: Cloudera Data Science Workbench now enables popular deep learning frameworks to run on GPUs, both on-premises and in the cloud. Embedded Web UIs: Users can work with the Apache Spark Web UI for Spark sessions. Other interactive web applications like TensorBoard, Shiny, and Plotly now appear directly in the workbench. Enhanced Job Scheduling: Cloudera Data Science Workbench users can now schedule jobs directly from external schedulers or orchestration systems via the new Jobs API. Cloudera Altus Workload Analytics: Cloudera Altus users can now access the industry’s first suite of self-service troubleshooting and performance management tools for transient data engineering workloads, enabling end users to diagnose common execution and performance issues without needing to contact an administrator. Operational DB Kudu Function/Performance Improvements (also applies to Analytic DB): Support has been added for timestamps, both directly through Apache Kudu and indirectly through Apache Impala. New supportability tools have been developed for correcting under-replicated tablets and for system checks. Performance enhancements to Kudu include improved bulk loading and improved behavior on denser nodes. HBase Cloud Storage support and Spark Integration: Apache HBase now has ADLS support and recommendations for Azure deployment. Outside of cloud, HBase now has support for long-lived Spark applications via token renewal. Analytic DB Usage-Enriched Query Assistance: Hue now integrates with Navigator Optimizer to provide intelligent recommendations for more efficient SQL query design. SQL developers immediately receive recommendations based on popular usage and access patterns, as well as Impala and Hive best practices for optimized query performance. Enhanced Analytic Workbench Interface: The updated Hue 4.0 provides a modernized, intuitive experience for SQL users that enables greater productivity and a seamless workflow. Added Cloud-Native Integrations and Faster SQL Analytics across Environments: Impala now supports Microsoft ADLS for cloud-native analytics and continues to see performance and efficiency gains across all storage options (Amazon S3, Microsoft ADLS, HDFS, Kudu). The full contents of this release include: Cloudera Enterprise 5.12 (comprising CDH 5.12, Cloudera Manager 5.12, and Cloudera Navigator 2.11) Cloudera Director 2.5 Apache Kudu 1.4 Cloudera Data Science Workbench 1.1 Cloudera Distribution of Kafka (CDK) 2.2 Cloudera Navigator Optimizer Updates Over the next few weeks, we’ll publish blog posts that cover some of these features in detail. In the meantime you can access the following links for additional information: Download Cloudera Enterprise 5.12 Explore documentation As always, we value your feedback; please provide any comments and suggestions through our community forums. You can also file bugs via issues.cloudera.org
... View more
04-07-2017
02:35 PM
We are happy to announce Apache Spark 2.1 release 1. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. Cloudera Distribution of Apache Spark 2.1 release 1 is compatible with the following CDH versions: CDH 5.7, CDH 5.8, CDH 5.9, CDH 5.10. What's New in Cloudera Distribution of Apache Spark 2.1 Release 1 New direct connector to Kafka that uses the new Kafka consumer API. See Spark 2 Kafka Integration for details. Issues Fixed in Cloudera Distribution of Apache Spark 2.1 - Release 1 [SPARK-19554][UI,YARN] Allow SHS URL to be used for tracking in YARN RM. [SPARK-16554][CORE] Automatically Kill Executors and Nodes when they are Blacklisted [SPARK-16654][CORE] Add UI coverage for Application Level Blacklisting [SPARK-8425][CORE] Application Level Blacklisting [SPARK-18117][CORE] Add test for TaskSetBlacklist [SPARK-18949][SQL][BACKPORT-2.1] Add recoverPartitions API to Catalog [SPARK-19459][SQL][BRANCH-2.1] Support for nested char/varchar fields in ORC [SPARK-19611][SQL] Introduce configurable table schema inference Download Cloudera Distribution of Apache Spark 2.1 release 1. Read the documentation. Want to become a pro Spark user? Sign up for Apache Spark Training.
... View more
03-27-2017
12:31 PM
We are pleased to announce the release of Impala JDBC v2.5.37 and Impala ODBC v2.5.37 drivers. This release has the following fixes and enhancements: Cloudera JDBC Driver for Impala 2.5.37 Enhancements & New Features Specify asynchronous exec poll interval. You can now specify the time in milliseconds between each poll that the driver makes for the query execution status. Specify the number of milliseconds in the Advanced Options dialog box in the Async Exec Poll Interval field, or in the AsyncExecPollInterval configuration option. Support for Impala 2.8. The driver now supports Impala versions 1.0.1 through 2.8. Resolved Issues Update/Delete statements require a semicolon at the end. Conflicting information in documentation regarding CDH. Cloudera ODBC Driver for Impala 2.5.37 Enhancements & New Features Specify asynchronous exec poll interval. You can now specify the time in milliseconds between each poll that the driver makes for the query execution status. Specify the number of milliseconds in the Advanced Options dialog box in the Async Exec Poll Interval field, or in the AsyncExecPollInterval configuration option. Support for Impala 2.8. The driver now supports Impala versions 1.0.1 through 2.8. Specify Kerberos hostname canonicalization. By default, if you specify a Kerberos realm, the Kerberos layer canonicalizes the host FQDN in the server’s service principal name. You can disable this behavior by disabling the Canonicalize Principal FQDN option, or by setting the ServicePrincipalCanonicalization connection property to 0. Configure SSL certificate revocation check. You can now configure the driver to check whether a TLS/SSL certificate stored in the Windows Trust Store has been revoked. By default, the driver checks for revocation. To disable the revocation check, clear the Check Certificate Revocation check box, or set the CheckCertRevocation key to 0. Simplified MIT Kerberos configuration. When using MIT Kerberos to access the Impala service from a Kerberos realm that is different than the Kerberos realm that the user belongs to, the user is no longer required to add the Impala service's network domain to the Kerberos realm mapping in the Kerberos configuration file on the client machine. Upgraded OpenSSL library. The driver now uses OpenSSL 1.0.2. Previously, the driver used OpenSSL 1.0.1l. Resolved Issues Conflicting information in documentation regarding CDH. Incorrect driver version verification instructions for macOS in documentation. Segmentation fault in Driver Manager detection on Linux and Solaris Sparc platforms. Queries using DISTINCT run correct in HUE but not via ODBC. Driver converts COALESCE function to less efficient Impala CASE statement. When attempting to use the Windows trust store on Windows Server 2016, an access violation exception occurs. Getting Started with the Cloudera Drivers Read the Cloudera JDBC 2.5.37 Driver for Impala Release Notes and Installation Guide. Read the Cloudera ODBC 2.5.37 Driver for Impala Release Notes and Installation Guide. Download the connector from the Cloudera Connectors page. As always, we welcome your feedback. Please send your comments and suggestions to the user group or through our community forums. You can also file bugs through our external JIRA projects on issues.cloudera.org.
... View more
02-24-2017
07:20 PM
We are happy to announce Spark 2.0 release 2. You can download the parcel and apply it directly to provisioned clusters without disrupting your currently running Spark workloads. Release 2 addresses a Hive compatibility issue that affects CDH 5.10.1 and higher, CDH 5.9.2 and higher, CDH 5.8.5 and higher, and CDH 5.7.6 and higher. If you are using one of these CDH versions, you must upgrade to the Spark 2.0 release 2 parcel to avoid Spark 2 job failures when using Hive. Release 2 is based on Apache Spark 2.0.2. Issues Fixed in Cloudera Distribution of Apache Spark 2.0 Release 2 [SPARK-4563] [CORE] Allow driver to advertise a different network address [SPARK-18993] Unable to build/compile Spark in IntelliJ due to missing Scala deps in spark-tags [SPARK-19314] Do not allow sort before aggregation in Structured Streaming plan [SPARK-18762] Web UI should be http:4040 instead of https:4040 [SPARK-18745] java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB) [SPARK-18703] Insertion/CTAS against Hive Tables: Staging Directories and Data Files Not Dropped Until Normal Termination of JVM [SPARK-18091] Deep if expressions cause Generated SpecificUnsafeProjection code to exceed JVM code size limit Download Cloudera Distribution of Apache Spark 2.0 release 2. Read the documentation. Want to become a pro Spark user? Sign up for Apache Spark Training.
... View more
Labels:
- Labels:
-
Spark
01-31-2017
09:24 AM
1 Kudo
Cloudera is proud to announce that Cloudera Enterprise 5.10 is now generally available (GA).The highlights of this release include the GA of the new columnar storage engine Apache Kudu, improved cloud performance and cost-optimizations, and cloud-native data governance for Amazon S3. As usual, there are also a number of quality enhancements and bug fixes (learn more about our multi-dimensional hardening/QA process) and other improvements across the stack. Here is a partial list of what’s included (see the Release Notes for a full list): GA of Apache Kudu - Unleash Cloudera’s new storage engine to enable fast analytics on fast changing data with the first generally available release of Kudu. Kudu is purpose-built to enable use cases for time series data, machine data analytics, and online reporting—as part of a complete analytic or operational database. Improved Cloud Performance - Run Cloudera workloads on public cloud infrastructure more efficiently and cost-effectively than ever before. Specific enhancements in this release include: Deploying new clusters in cloud environments faster using Cloudera Director Running transient batch processing jobs up to 2x faster compared to previous releases. Reducing AWS instance costs by leveraging Amazon Spot Block instances. Big Data Governance for the Hybrid Cloud - Cloudera Navigator now provides cataloging, metadata management, and comprehensive lineage for data in Amazon S3, making it the only big data management and governance solution for data stored on-premise and in the cloud. This release also includes policy-based business metadata assignment and validation, major performance optimizations, and a refreshed look-and-feel for increased data stewardship productivity. Expanded Recommendations for Active Data Optimization - Cloudera Navigator Optimizer now provides expanded recommendations and risk alerts, making it even easier for architects and DBAs to understand, migrate, and manage workloads on Hadoop. Added Efficiencies and Design Assistance for SQL Developers - Increase SQL developer productivity with the latest version of Hue, which provides improved exploration and table sampling (including over Amazon S3), better support for viewing and interacting with Parquet files, and faster loading of documents. Continued Security & Compliance Improvements - Increase overall platform and application security and compliance by taking advantage of new cloud access key management controls, Kafka authorization via Sentry, and new data encryption techniques. The full contents of this release include: Cloudera Enterprise 5.10 (comprising CDH 5.10, Cloudera Manager 5.10, and Cloudera Navigator 2.9) Cloudera Director 2.3 Cloudera Navigator Optimizer Updates Kafka 2.1 Kudu 1.2 Over the next few weeks, we’ll publish blog posts that cover some of these features in detail. In the meantime: Download Cloudera Enterprise 5.10 Explore documentation As always, we value your feedback; please provide any comments and suggestions through our community forums. You can also file bugs via issues.cloudera.org.
... View more
12-12-2016
04:27 PM
1 Kudo
Apache Spark is a core component of the Cloudera Enterprise platform. It is the de facto processing engine for Hadoop and the modern analytics engine for an increasing number of workloads. Organizations leverage Apache Spark to reduce churn, implement predictive maintenance, and perform complex risk modeling and analysis. IT professionals leverage Spark to accelerate data processing, train large-scale machine learning models, and perform exploratory data science. Taneja reports that for the most critical Spark workloads, 57% of users choose to partner with Cloudera because of the quality of support and breadth of training and services. The Apache Spark ecosystem continues to grow at a fast pace, and Cloudera delivers the newest, most desired features with reliability and performance at scale. We are happy to announce support for Apache Spark version 2.0. CDH users can download the parcel and apply it directly to provisioned clusters. You can leverage Spark 2.0 without disrupting your currently running Spark workloads. Spark 2.0 capabilities include the following: Combined API - A unified API for batch and streaming jobs. Machine learning persistence - The ability to save and load ML models via MLlib persistence. Structured streaming - The first streaming API running on top of SparkSQL. Improved Performance. Download Cloudera Distribution of Apache Spark 2.0 Release 1 Read the documentation and our blog Want to become a pro Spark user? Sign up for Apache Spark Training.
... View more