Support Questions

retricia1 · ‎07-28-2016

I want to setup a spark streaming-cluster with dynamic allocation. I tested the dynamic allocation on it by submitting the SparkPi application and the dynamic allocation works fine.

Then I tried my own application: It get his input from an other server and I used the socketTextStream to receiving the input data.

The application is very simple:

final JavaReceiverInputDStream<String> stream = 
	ssc.socketTextStream(host, port, StorageLevels.MEMORY_AND_DISK_SER);
final JavaDStream<MyObject> mapStream = stream.map(...);
mapStream.foreachRDD(new VoidFunction<JavaRDD<MyObject>>() {
	@Override
	public void call(final JavaRDD<MyObject> stringJavaRDD) throws Exception {
		stringJavaRDD.collect();
        }
});

The map function is the core of my application and it need some milliseconds computation time for each event.

When I increase the number of events, the cluster allocate new containers and when slowing down the events the number of containers decrease, but the new allocated containers are never used. I checked the number of computed tasks per executor, they are always 0. Because the containers are never used, spark deallocate them after executorIdleTimeout and immediately allocate a new one because the workload is very high. Only the first 2 containers from the application start do the jobs.

I though, maybe it could help if I use Kafka for receiving the events distributed to all containers. (I don't know if it is the right way to solve my problem.)

With Kafka I got an other problem: Spark didn't allocate more container as I have partitions set for my topic.

I use HDP 2.4 to setup my 3 node cluster with:

Zookeeper: 3.4
HDFS, YARN, MapReduce2: 2.7
spark: 1.6
Kafka 0.9

Each node has 4 cores an 8 GB RAM.

Can you tell me, which option is the right for solving my problem an how can I fix it?

Thank you so much for helping me 🙂

clukasik · ‎07-28-2016

Please keep in mind (from HDP 2.4 docs😞

Dynamic Resource Allocation does not work with Spark Streaming.

View solution in original post

vbonthu · ‎07-28-2016

Hi

@Rene ReneCan you also share the following properties value ? and how much memory and cores are your nodes contributing to yarn ( You can see the value in Resource Manager UI -> Nodes )

yarn.scheduler.minimum-allocation-mb

yarn.nodemanager.resource.memory-mb

mapreduce.map.memory.mb

mapreduce.map.java.opts.max.heap

mapreduce.map.cpu.vcores

retricia1 · ‎07-28-2016

Each node contribute 4GB RAM and 4 cores to yarn. I submit the application with:

--driver-memory 640mb
--executor-memory 640mb

With the overhead each container use 1 GB RAM. Altogether 12 containers are possible.

yarn.scheduler.minimum-allocation-mb=512mb
yarn.nodemanager.resource.memory-mb=4096mb
mapreduce.map.memory.mb=1.5GB

The following properties are not defined:

mapreduce.map.java.opts.max.heap
mapreduce.map.cpu.vcores

clukasik · ‎07-28-2016

Please keep in mind (from HDP 2.4 docs😞

Dynamic Resource Allocation does not work with Spark Streaming.

retricia1 · ‎07-28-2016

Oh no.... I did't saw this before. Is this only a HDP 2.4 problem? On CDH-Doc it seems to be possible.

lgeorge · ‎07-28-2016

For background see https://mail-archives.apache.org/mod_mbox/spark-user/201511.mbox/%3CCA+AHuKksNyOxq8NazZA_mbHb+J1Ry-h...

Pinging @vshukla to see if he has any updates.

vshukla · ‎07-29-2016

Spark Streaming & Dynamic Resource Allocation is a new feature with Spark 2.0 (https://issues.apache.org/jira/browse/SPARK-12133) so it is not yet available in either HDP or CDH.

tataru_marian · ‎07-25-2017

Hello, HDP 2.6.1 has Spark 2, is dynamic resource allocation for streaming jobs working now?

tataru_marian · ‎07-25-2017

Hello, HDP 2.6.1 has Spark 2, is dynamic resource allocation for streaming jobs working now?

tataru_marian · ‎07-25-2017

Hello, HDP 2.6.1 has Spark 2, is dynamic resource allocation for streaming jobs working now?

Cloudera Community

Support Questions

Spark-Streaming and dynamic allocation