Created 07-28-2016 03:39 PM
I want to setup a spark streaming-cluster with dynamic allocation. I tested the dynamic allocation on it by submitting the SparkPi application and the dynamic allocation works fine.
Then I tried my own application: It get his input from an other server and I used the socketTextStream to receiving the input data.
The application is very simple:
final JavaReceiverInputDStream<String> stream = ssc.socketTextStream(host, port, StorageLevels.MEMORY_AND_DISK_SER); final JavaDStream<MyObject> mapStream = stream.map(...); mapStream.foreachRDD(new VoidFunction<JavaRDD<MyObject>>() { @Override public void call(final JavaRDD<MyObject> stringJavaRDD) throws Exception { stringJavaRDD.collect(); } });
The map function is the core of my application and it need some milliseconds computation time for each event.
When I increase the number of events, the cluster allocate new containers and when slowing down the events the number of containers decrease, but the new allocated containers are never used. I checked the number of computed tasks per executor, they are always 0. Because the containers are never used, spark deallocate them after executorIdleTimeout and immediately allocate a new one because the workload is very high. Only the first 2 containers from the application start do the jobs.
I though, maybe it could help if I use Kafka for receiving the events distributed to all containers. (I don't know if it is the right way to solve my problem.)
With Kafka I got an other problem: Spark didn't allocate more container as I have partitions set for my topic.
I use HDP 2.4 to setup my 3 node cluster with:
Each node has 4 cores an 8 GB RAM.
Can you tell me, which option is the right for solving my problem an how can I fix it?
Thank you so much for helping me 🙂
Created 07-28-2016 04:22 PM
Please keep in mind (from HDP 2.4 docs😞
Dynamic Resource Allocation does not work with Spark Streaming.
Created 07-28-2016 03:43 PM
Hi
@Rene ReneCan you also share the following properties value ? and how much memory and cores are your nodes contributing to yarn ( You can see the value in Resource Manager UI -> Nodes )yarn.scheduler.minimum-allocation-mb
yarn.nodemanager.resource.memory-mb
mapreduce.map.memory.mb
mapreduce.map.java.opts.max.heap
mapreduce.map.cpu.vcores
Created 07-28-2016 04:10 PM
Each node contribute 4GB RAM and 4 cores to yarn. I submit the application with:
--driver-memory 640mb --executor-memory 640mb
With the overhead each container use 1 GB RAM. Altogether 12 containers are possible.
yarn.scheduler.minimum-allocation-mb=512mb yarn.nodemanager.resource.memory-mb=4096mb mapreduce.map.memory.mb=1.5GB
The following properties are not defined:
mapreduce.map.java.opts.max.heap mapreduce.map.cpu.vcores
Created 07-28-2016 04:22 PM
Please keep in mind (from HDP 2.4 docs😞
Dynamic Resource Allocation does not work with Spark Streaming.
Created 07-28-2016 05:15 PM
Oh no.... I did't saw this before. Is this only a HDP 2.4 problem? On CDH-Doc it seems to be possible.
Created 07-28-2016 06:17 PM
For background see https://mail-archives.apache.org/mod_mbox/spark-user/201511.mbox/%3CCA+AHuKksNyOxq8NazZA_mbHb+J1Ry-h...
Pinging @vshukla to see if he has any updates.
Created 07-29-2016 12:44 AM
Spark Streaming & Dynamic Resource Allocation is a new feature with Spark 2.0 (https://issues.apache.org/jira/browse/SPARK-12133) so it is not yet available in either HDP or CDH.
Created 07-25-2017 04:12 PM
Hello, HDP 2.6.1 has Spark 2, is dynamic resource allocation for streaming jobs working now?
Created 07-25-2017 04:13 PM
Hello, HDP 2.6.1 has Spark 2, is dynamic resource allocation for streaming jobs working now?
Created 07-25-2017 04:13 PM
Hello, HDP 2.6.1 has Spark 2, is dynamic resource allocation for streaming jobs working now?