Support Questions

Find answers, ask questions, and share your expertise

Spark Job hangs when run on zeppelin

avatar
Expert Contributor

hi,

When I run a spark job through Zeppelin I get the following output and the job just hangs and never returns. Does anyone have any idea how I could debug and address this problem?

I'm running spark 1.6 and HDP 2.4.

Thanks,

Mike

	 INFO [2016-04-13 18:01:17,746] ({Thread-65} Logging.scala[logInfo]:58) - Block broadcast_4 stored as values in memory (estimated size 305.3 KB, free 983.8 KB)
	 INFO [2016-04-13 18:01:17,860] ({Thread-65} Logging.scala[logInfo]:58) - Block broadcast_4_piece0 stored as bytes in memory (estimated size 25.9 KB, free 1009.7 KB)
	 INFO [2016-04-13 18:01:17,876] ({dispatcher-event-loop-0} Logging.scala[logInfo]:58) - Added broadcast_4_piece0 in memory on 148.88.72.84:56438 (size: 25.9 KB, free: 511.0 MB)
	 INFO [2016-04-13 18:01:17,893] ({Thread-65} Logging.scala[logInfo]:58) - Created broadcast 4 from textFile at NativeMethodAccessorImpl.java:-2
	 INFO [2016-04-13 18:01:18,162] ({Thread-65} FileInputFormat.java[listStatus]:249) - Total input paths to process : 1
	 INFO [2016-04-13 18:01:18,279] ({Thread-65} Logging.scala[logInfo]:58) - Starting job: count at <string>:3
	 INFO [2016-04-13 18:01:18,317] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Got job 2 (count at <string>:3) with 2 output partitions
	 INFO [2016-04-13 18:01:18,321] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Final stage: ResultStage 2 (count at <string>:3)
	 INFO [2016-04-13 18:01:18,322] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Parents of final stage: List()
	 INFO [2016-04-13 18:01:18,325] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Missing parents: List()
	 INFO [2016-04-13 18:01:18,333] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Submitting ResultStage 2 (PythonRDD[8] at count at <string>:3), which has no missing parents
	 INFO [2016-04-13 18:01:18,366] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Block broadcast_5 stored as values in memory (estimated size 6.2 KB, free 1015.9 KB)
	 INFO [2016-04-13 18:01:18,406] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Block broadcast_5_piece0 stored as bytes in memory (estimated size 3.7 KB, free 1019.6 KB)
	 INFO [2016-04-13 18:01:18,407] ({dispatcher-event-loop-1} Logging.scala[logInfo]:58) - Added broadcast_5_piece0 in memory on 148.88.72.84:56438 (size: 3.7 KB, free: 511.0 MB)
	 INFO [2016-04-13 18:01:18,410] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Created broadcast 5 from broadcast at DAGScheduler.scala:1006
	 INFO [2016-04-13 18:01:18,416] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Submitting 2 missing tasks from ResultStage 2 (PythonRDD[8] at count at <string>:3)
	 INFO [2016-04-13 18:01:18,417] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Adding task set 2.0 with 2 tasks
	 INFO [2016-04-13 18:01:18,428] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Added task set TaskSet_2 tasks to pool default
	 WARN [2016-04-13 18:01:23,225] ({Timer-0} Logging.scala[logWarning]:70) - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
	 WARN [2016-04-13 18:01:38,225] ({Timer-0} Logging.scala[logWarning]:70) - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
1 ACCEPTED SOLUTION

avatar
Guru

@mike harding

this looks like Yarn is not able to allocate containers for the executors. When you look at Yarn Resource Manager Ui, is there a job from zeppelin in accepted mode? If so, how much memory is available for Yarn to allocate (should be on the same ui)? If the job is in accepted state and there is no memory or not enough memory available the job will not start until Yarn gets resources freed up. If this is the case, try adding more memory for Yarn in ambari.

View solution in original post

8 REPLIES 8

avatar
Expert Contributor

Mike - Did you try running from spark -shell the same job ? is it success and only fails when run from zeppelin ?

avatar
Expert Contributor

I can run the same job from the pyspark shell with no problems, it executes immediately.

avatar
Guru

@mike harding

this looks like Yarn is not able to allocate containers for the executors. When you look at Yarn Resource Manager Ui, is there a job from zeppelin in accepted mode? If so, how much memory is available for Yarn to allocate (should be on the same ui)? If the job is in accepted state and there is no memory or not enough memory available the job will not start until Yarn gets resources freed up. If this is the case, try adding more memory for Yarn in ambari.

avatar
Expert Contributor
zeppelinZeppelinSPARKdefaultWed Apr 13 17:41:12 +0100 2016N/ARUNNINGUNDEFINED

It says that its still running and is using 66% of the queue/clsuter memory.

avatar
Guru

@mike harding

What about cores? The Yarn RM UI should show the number of cores that Yarn has available to allocate. Are there any cores still available? Are there any other jobs in the running state? If you click on the application master link on the Yarn RM UI, that should take you to the Spark UI, is it showing any jobs as incomplete?

avatar
Expert Contributor

I think this was the issue - ambari has auto-configured only one vCore. When I increased this it seemed to solve the problem.

avatar
New Contributor

I'm having similar issues. Which parameter did you change in Ambari? For me it seems to fail part way through, often at the last stage.

avatar

Another option you can try is clicking on the "Interpreters" link at the top of Zeppelin page, find the "spark" interpreter, and click on the "restart" button on the right-hand side. Next, make sure that your notebook page shows "Connected" with a green dot, meaning it is talking successfully with the Spark driver.