Apache Spark is a core component of the Cloudera Enterprise platform. It is the de facto processing engine for Hadoop and the modern analytics engine for an increasing number of workloads. Organizations leverage Apache Spark to reduce churn, implement predictive maintenance, and perform complex risk modeling and analysis. IT professionals leverage Spark to accelerate data processing, train large-scale machine learning models, and perform exploratory data science.
Taneja reports that for the most critical Spark workloads, 57% of users choose to partner with Cloudera because of the quality of support and breadth of training and services. The Apache Spark ecosystem continues to grow at a fast pace, and Cloudera delivers the newest, most desired features with reliability and performance at scale.
We are happy to announce support for Apache Spark version 2.0. CDH users can download the parcel and apply it directly to provisioned clusters. You can leverage Spark 2.0 without disrupting your currently running Spark workloads. Spark 2.0 capabilities include the following:
- Combined API - A unified API for batch and streaming jobs.
- Machine learning persistence - The ability to save and load ML models via MLlib persistence.
- Structured streaming - The first streaming API running on top of SparkSQL.
- Improved Performance.
Download Cloudera Distribution of Apache Spark 2.0 Release 1
Read the documentation and our blog
Want to become a pro Spark user? Sign up for Apache Spark Training.