Member since
07-31-2024
4
Posts
9
Kudos Received
0
Solutions
02-26-2025
08:00 AM
Accessing Apache Spark from Your Favorite IDE Working with Apache Spark just got easier. The recent release of Cloudera Data Engineering 1.23 introduces External IDE Connectivity, powered by Spark Connect. You can now interact with Spark clusters directly from your preferred local coding environments like Jupyter Notebook, VS Code, and PyCharm. This capability helps to streamline your data engineering workflows, enabling faster development and better collaboration, all while preserving enterprise-grade security. This 6-minute demo will walk you through how to access Spark from your favorite IDE with Cloudera Data Engineering: Key Benefits External IDE Connectivity brings a new level of flexibility for data engineering, allowing users to work with remote data from their local environment via secure, automated continuous integration and continuous delivery (CI/CD) pipelining. This approach offers several advantages, including: 1. Develop Locally, Compute Seamlessly Developers can connect to Spark from local notebooks like Jupyter or VS Code while keeping data secure and governed. External IDE Connectivity allows teams to extract data from the open data lakehouse with Spark, analyze or test it locally, and run workloads at scale—all within a unified environment. 2. Iterative Development with Flexible CI/CD Pipelining This capability integrates Spark workflows seamlessly into DevOps processes. With External IDE Connectivity and Git-based version control, teams can automate testing, monitor changes, and accelerate the deployment of data pipelines. 3. Hybrid, Open, and Enterprise-Ready by Design External IDE Connectivity is available on both Cloudera Data Engineering on cloud and on premises. Cloudera Data Engineering’s built-in multi-tenancy allows teams to securely run multiple workloads while optimizing resource usage and governance. With native, best-in-class Apache Spark and Apache Iceberg integration, Cloudera Data Engineering ensures high performance, optimized total cost of ownership (TCO), and support for open data architectures. Ready to Learn More? View Product Release Notes, Compatibility and Runtime Components for Cloudera Data Engineering Start your 5-day trial for Cloudera Data Engineering Download the new datasheet for Cloudera Data Engineering
... View more
Labels:
11-12-2024
09:54 AM
7 Kudos
We are thrilled to share the latest release of Cloudera Data Engineering 1.23 on public cloud. With major enhancements for developer experience and product stability, this release enables you to benefit from increased productivity and collaboration while ensuring enterprise-grade security. Discover New Benefits & Features Boosted Development Productivity: The External IDE Connectivity (Tech Preview) powered by Apache Spark Connect enables practitioners to run large-scale Spark workloads remotely on Cloudera Data Engineering. That means you can access Spark from your favorite IDE and collaborate more efficiently with your team. Optimized, Cost-Effective Workflows: Support for Apache Airflow 2.9 and Python 3.11 expands access to the latest libraries and frameworks, equipping you with tools to meet your dynamic needs for building scalable data pipelines. Apache Iceberg 1.5, together with the support of Apache Spark 3.5, delivers better overall performance and cost management. For example, in use cases such as Change Data Capture (CDC), improvements in row-level deletes with Merge-on-Read enable more efficient query processing. This results in faster query execution and reduced resource consumption, leading to potential cost savings. Improved Administrative Capabilities: We strengthened stability with In-Place Upgrade and Backup & Restore to ensure a reliable and smooth upgrade experience. Ready to Learn More? View Product Release Notes, Compatibility and Runtime Components for Cloudera Data Engineering Start your 5-day trial for Cloudera Data Engineering on AWS Download the new datasheet for Cloudera Data Engineering
... View more
- Tags:
- Data Engineering
Labels:
08-07-2024
02:14 PM
1 Kudo
We are thrilled to announce that Cloudera Data Engineering now supports AWS Graviton3 instances. This marks a significant milestone in our commitment to delivering high-performance, cost-effective solutions for your data engineering needs. AWS Graviton is a family of 64-bit ARM-based CPUs designed to support improved compute performance, memory bandwidth, and machine learning capabilities with less energy consumption –all essential criteria for modern data engineering workloads. This blog walks you through how you can take advantage of these powerful processors to deliver stronger results. 15%+ Cost Savings and Performance Enhancements One of the key benefits of switching to AWS Graviton3 instances is reducing costs, while achieving comparable performance. Cloudera customers may reduce their costs by 15-20% compared to alternative instance types. Get Started with Graviton in Cloudera Data Engineering Create AWS Graviton instances for Cloudera Data Engineering in six steps: Log in to the Cloudera Data Engineering Console: Ensure you have the necessary permissions to create and manage services. Navigate to the Administration Page: Go to the Administration page, where you can create and manage your Cloudera Data Engineering services. Create a New Cloudera Data Engineering Service: Click on "Create New Service." Choose Graviton Instance Types: In the "Workload Type" drop-down menu, select one of the supported Graviton instance types. Here is the list of supported Graviton instance types: r7gd r7g m7gd m7g Configure Service Details: Set up the details of your new Cloudera Data Engineering service according to your requirements, including network settings, storage configurations, and any additional options. Review and Creation: Review your service configuration and click on "Create" to start your new Cloudera Data Engineering service using Graviton3 instances. You can check here for the compatibility matrix of Graviton in Cloudera Data Engineering. Learn more about how Cloudera Data Engineering can help you orchestrate and secure data pipelines. Explore our community page for the latest discussions and documentation for detailed best practices. The Cloudera Data Engineering team looks forward to your feedback and is excited to see how you take advantage of this new capability. If you need assistance, our support team is here to help, as always.
... View more
Labels:
07-31-2024
07:29 AM
1 Kudo
We are excited to announce a series of new updates on Cloudera Data Engineering for the public cloud. Explore how you can benefit from greater performance, cost-efficiency, and versatility with new features for modern data engineering workloads. Discover New Benefits 15%+ Cost Savings with AWS Graviton: We now support AWS ARM-based processors (Graviton). Switching to Graviton instances can reduce infrastructure costs by 15-20% while achieving comparable performance. Upgraded Performance: Support for Apache Spark 3.5 and Apache Iceberg 1.4 boosts performance and provides access to more practitioner-focused capabilities, including Spark Connect and enhanced PySpark and SQL capabilities. This dual upgrade also enables multi-table commits and enhances consistency in your data operations. You can find our compatibility matrix here. Productivity Boost with Flexible Practitioner Tooling: Take advantage of iterative development with Interactive Sessions. Develop, test, and refine your data workflows in a highly dynamic and flexible environment. Ready to Learn More? Start your 5-day Trial for Cloudera Data Engineering on Public Cloud View full release notes Join our Data Engineering Community
... View more
Labels: