Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetri csUpdate(9,WrappedArray())

avatar
New Contributor

Scenario:

HDinsight cluster with HDP 3.6

2 head node(8CPU/node,56GB/node) and 2 worker nodes(8CPU/node,56GB/node)

Container Size: 512 MB

There are around 800,000 files which needs to be merged as one single file which is estimated to be 2GB. On an average, each file consists of 2kB data(single event/file)

We are loading events files(json) through spark using below flow.

app submit command:

/usr/bin/spark-submit --deploy-mode cluster --queue default --verbose --num-executors 25 --executor-memory 2G --executor-cores 2 --driver-memory 6G --conf spark.shuffle.consolidateFiles=true --conf spark.driver.maxResultSize=5G --conf spark.yarn.executor.memoryOverhead=52m --conf spark.yarn.driver.memoryOverhead=300m --jars /usr/lib/customhivelibs/json-serde-1.3.9-SNAPSHOT-jar-with-dependencies.jar --py-files /home/sshuser/smartRefinery_wo.properties PythonScriptProdInt_wo.py cont_log

RDD=textFile(events_file)

RDD.repartition(1).saveAsTextFile('merged_eventsFile')

Issue: Repartition is consuming around 50 mins to merge as a single file. Once this repartitioning is complete the application log throws below error continously for another hour(more than 6000 lines of error)

18/02/28 11:23:37 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetri csUpdate(9,WrappedArray())

18/02/28 11:23:37 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetri csUpdate(8,WrappedArray())

18/02/28 11:23:37 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetri csUpdate(19,WrappedArray())

18/02/28 11:23:38 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetri csUpdate(1,WrappedArray())

18/02/28 11:23:38 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetri csUpdate(16,WrappedArray())

18/02/28 11:23:38 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetri csUpdate(11,WrappedArray())

Workaround tried: Tried setting up the parameter spark.scheduler.listenerbus.eventqueue.size to 200000, but still recieving error

Any help would be greatly appreciated!

2 REPLIES 2

avatar
Super Collaborator

Hi @Manjunath Patel,

SparkListenerBus has already stopped!

Is due to the interruption of the program without proper shutdown of the context, implies program died before notifying all the other executors in the platform.

This occurs if you handle the errors by terminating the program with sys.exit , so that the context jvm died without notifying other agents.

best you could do is stop the context (sc.stop or spark.stop) gracefully before you terminate the jvm, so that it is easy for you to debug any other errors in program.

In case of over commiting the resources (memory) without swap also may cause this as the OS abruptly kill the JVM.

Hope this helps !!

avatar
New Contributor

Hi @bkosaraju,

I am using sc.stop() in the main where I am starting the sparkContext ,also ensuring that sc.stop is being used before sys.exit(). On the other hand, the resources are not overcommitted as I can see still YARN is 30% free.

The first error I see is:

18/03/01 08:13:14 ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of the SparkListeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.

Followed by Warnings almost every minute

18/03/01 08:13:14 WARN LiveListenerBus: Dropped 1 SparkListenerEvents since Thu Jan 01 00:00:00 UTC 1970
18/03/01 08:14:14 WARN LiveListenerBus: Dropped 4643 SparkListenerEvents since Thu Mar 01 08:13:14 UTC 2018
18/03/01 08:15:14 WARN LiveListenerBus: Dropped 4759 SparkListenerEvents since Thu Mar 01 08:14:14 UTC 2018
18/03/01 08:16:15 WARN LiveListenerBus: Dropped 4869 SparkListenerEvents since Thu Mar 01 08:15:14 UTC 2018
18/03/01 08:17:15 WARN LiveListenerBus: Dropped 4885 SparkListenerEvents since Thu Mar 01 08:16:15 UTC 2018
18/03/01 08:18:15 WARN LiveListenerBus: Dropped 4653 SparkListenerEvents since Thu Mar 01 08:17:15 UTC 2018
18/03/01 08:19:17 WARN LiveListenerBus: Dropped 4549 SparkListenerEvents since Thu Mar 01 08:18:15 UTC 2018
18/03/01 08:20:17 WARN LiveListenerBus: Dropped 4344 SparkListenerEvents since Thu Mar 01 08:19:17 UTC 2018
18/03/01 08:21:17 WARN LiveListenerBus: Dropped 4559 SparkListenerEvents since Thu Mar 01 08:20:17 UTC 2018
18/03/01 08:22:18 WARN LiveListenerBus: Dropped 4367 SparkListenerEvents since Thu Mar 01 08:21:17 UTC 2018
18/03/01 08:23:18 WARN LiveListenerBus: Dropped 4525 SparkListenerEvents since Thu Mar 01 08:22:18 UTC 2018
18/03/01 08:24:18 WARN LiveListenerBus: Dropped 4442 SparkListenerEvents since Thu Mar 01 08:23:18 UTC 2018
18/03/01 08:25:18 WARN LiveListenerBus: Dropped 4404 SparkListenerEvents since Thu Mar 01 08:24:18 UTC 2018

Not sure, if I am missing anything...