About dciciani

dciciani · ‎04-20-2021

Introduction When working with CDE in CDP Public Cloud, there may be a need to allocate fractions of CPU to our Spark Jobs, without losing parallelism. Following are some of the real scenarios: Spark application that reads from HBase and performs CPU-light processing, causing a significant I/O wait. Reducing the number of executors or cores per executor is not optimal because the parallel reads from HBase scale linearly, increase the job duration. Spark application that needs high parallelism for the data processing, but before writing the data to HDFS, the partitions are coalesced to a smaller number to avoid creating many small files. So the single job uses all the assigned cores during processing and less cores during the output, leaving some CPU idle. Steps To allocate fractions of CPUs to Spark in CDE, we need to set the 'spark.kubernetes.executor.request.cores' config. This could be set to 0.1, 500m, 1.5, 5, etc. More details on official Spark documentation. So, let's define a simple job in CDE, specifying 16 executors with each 2 CPU 6 GB ram Running this first job, we can see that we used 31 CPUs in total job 1 without fractioned cpus and looking to the Spark UI, 2 cores and 2 tasks are reported for each executor, as expected: spark ui without fractioned cpus Let's now launch the same job, but adding the property spark.kubernetes.executor.request.cores = 0.5 and keeping the number of cores for executor = 2 (spark.executor.cores). As you can see, about 1/4 of the cores previously used are now allocated: spark job with fractioned cpu option while from the Spark UI, instead, there are still 2 tasks for each executor, confirming that there is no interaction/ overwriting with the parallelization effect regulated by the property spark.executor.cores = 2: spark ui with fractioned cpus option Conclusion This post provided an example to allocate fractions of CPUs to our Spark jobs, without losing parallelism.

Online	Offline
Last Visited	‎04-05-2023 08:44 AM

Member Since	‎10-28-2020 02:57 AM
Last Visited	‎04-05-2023 08:44 AM
Posts	1

Cloudera Community

Fractional cores in CDE Spark