Looking for help with Cloudera Data Platform Spark job optimization. We're running large-scale ETL jobs that are timing out during the shuffle phase, consuming excessive cluster resources and causing memory spillage.
The jobs process ~500GB datasets but execution times have increased 3x after migrating to CDP. Need someone experienced with Spark tuning on Cloudera and YARN resource management to identify bottlenecks.
Seeking 3-4 hours remote performance analysis to optimize job configuration and cluster settings. Must be resolved by Tuesday for our data pipeline SLA.