Cloudera Data Engineering is an integrated, purpose-built experience for data engineers. It delivers a streamlined service for scheduling, monitoring, debugging, and promoting data pipelines quickly & securely across the enterprise at scale.
The key to the experience is a centralized interface that simplifies the job management life cycle from scheduling, deploying, monitoring, debugging, and promotion which alleviate many of the challenges with running Spark jobs in production at scale. Similar to CML and CDW, CDE is cloud native leveraging Kubernetes where Platform admins can quickly provision virtual compute clusters with strong isolation, capacity auto-scaling and quotas for cost management.
For Platform Admins:
- Managed Spark Service running on Kubernetes with mixed-version spark deployments accelerating DE workflows with zero setup. One click provisioning of new workloads with guardrails for CPU and Memory.
- Data Governance and management through integration with SDX for security and visibility with automatic lineage capture without any code changes.
- Monitoring of system services and utilization metrics through Grafana
- CDP security integration that includes SSO with FreeIPA, Kerberos, Ranger, Knox, and Istio.
For Data Engineers:
- Easy job deployment with configuration management, dependency artifacts, and spark tuning parameters
- Apache Airflow-based scheduling service for orchestration of complex data pipelines with job dependencies.
- Self-service visual troubleshooting and performance tuning of Spark jobs.
- Rich API support for CI/CD and other automation use-cases. Accessible through CLI and REST API.
For more information: