Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Spark Job hangs when run on zeppelin

Contributor

hi,

When I run a spark job through Zeppelin I get the following output and the job just hangs and never returns. Does anyone have any idea how I could debug and address this problem?

I'm running spark 1.6 and HDP 2.4.

Thanks,

Mike

	 INFO [2016-04-13 18:01:17,746] ({Thread-65} Logging.scala[logInfo]:58) - Block broadcast_4 stored as values in memory (estimated size 305.3 KB, free 983.8 KB)
	 INFO [2016-04-13 18:01:17,860] ({Thread-65} Logging.scala[logInfo]:58) - Block broadcast_4_piece0 stored as bytes in memory (estimated size 25.9 KB, free 1009.7 KB)
	 INFO [2016-04-13 18:01:17,876] ({dispatcher-event-loop-0} Logging.scala[logInfo]:58) - Added broadcast_4_piece0 in memory on 148.88.72.84:56438 (size: 25.9 KB, free: 511.0 MB)
	 INFO [2016-04-13 18:01:17,893] ({Thread-65} Logging.scala[logInfo]:58) - Created broadcast 4 from textFile at NativeMethodAccessorImpl.java:-2
	 INFO [2016-04-13 18:01:18,162] ({Thread-65} FileInputFormat.java[listStatus]:249) - Total input paths to process : 1
	 INFO [2016-04-13 18:01:18,279] ({Thread-65} Logging.scala[logInfo]:58) - Starting job: count at <string>:3
	 INFO [2016-04-13 18:01:18,317] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Got job 2 (count at <string>:3) with 2 output partitions
	 INFO [2016-04-13 18:01:18,321] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Final stage: ResultStage 2 (count at <string>:3)
	 INFO [2016-04-13 18:01:18,322] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Parents of final stage: List()
	 INFO [2016-04-13 18:01:18,325] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Missing parents: List()
	 INFO [2016-04-13 18:01:18,333] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Submitting ResultStage 2 (PythonRDD[8] at count at <string>:3), which has no missing parents
	 INFO [2016-04-13 18:01:18,366] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Block broadcast_5 stored as values in memory (estimated size 6.2 KB, free 1015.9 KB)
	 INFO [2016-04-13 18:01:18,406] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Block broadcast_5_piece0 stored as bytes in memory (estimated size 3.7 KB, free 1019.6 KB)
	 INFO [2016-04-13 18:01:18,407] ({dispatcher-event-loop-1} Logging.scala[logInfo]:58) - Added broadcast_5_piece0 in memory on 148.88.72.84:56438 (size: 3.7 KB, free: 511.0 MB)
	 INFO [2016-04-13 18:01:18,410] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Created broadcast 5 from broadcast at DAGScheduler.scala:1006
	 INFO [2016-04-13 18:01:18,416] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Submitting 2 missing tasks from ResultStage 2 (PythonRDD[8] at count at <string>:3)
	 INFO [2016-04-13 18:01:18,417] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Adding task set 2.0 with 2 tasks
	 INFO [2016-04-13 18:01:18,428] ({dag-scheduler-event-loop} Logging.scala[logInfo]:58) - Added task set TaskSet_2 tasks to pool default
	 WARN [2016-04-13 18:01:23,225] ({Timer-0} Logging.scala[logWarning]:70) - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
	 WARN [2016-04-13 18:01:38,225] ({Timer-0} Logging.scala[logWarning]:70) - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
1 ACCEPTED SOLUTION

Guru

@mike harding

this looks like Yarn is not able to allocate containers for the executors. When you look at Yarn Resource Manager Ui, is there a job from zeppelin in accepted mode? If so, how much memory is available for Yarn to allocate (should be on the same ui)? If the job is in accepted state and there is no memory or not enough memory available the job will not start until Yarn gets resources freed up. If this is the case, try adding more memory for Yarn in ambari.

View solution in original post

8 REPLIES 8

Rising Star

Mike - Did you try running from spark -shell the same job ? is it success and only fails when run from zeppelin ?

Contributor

I can run the same job from the pyspark shell with no problems, it executes immediately.

Guru

@mike harding

this looks like Yarn is not able to allocate containers for the executors. When you look at Yarn Resource Manager Ui, is there a job from zeppelin in accepted mode? If so, how much memory is available for Yarn to allocate (should be on the same ui)? If the job is in accepted state and there is no memory or not enough memory available the job will not start until Yarn gets resources freed up. If this is the case, try adding more memory for Yarn in ambari.

Contributor
zeppelinZeppelinSPARKdefaultWed Apr 13 17:41:12 +0100 2016N/ARUNNINGUNDEFINED

It says that its still running and is using 66% of the queue/clsuter memory.

Guru

@mike harding

What about cores? The Yarn RM UI should show the number of cores that Yarn has available to allocate. Are there any cores still available? Are there any other jobs in the running state? If you click on the application master link on the Yarn RM UI, that should take you to the Spark UI, is it showing any jobs as incomplete?

Contributor

I think this was the issue - ambari has auto-configured only one vCore. When I increased this it seemed to solve the problem.

New Contributor

I'm having similar issues. Which parameter did you change in Ambari? For me it seems to fail part way through, often at the last stage.

Another option you can try is clicking on the "Interpreters" link at the top of Zeppelin page, find the "spark" interpreter, and click on the "restart" button on the right-hand side. Next, make sure that your notebook page shows "Connected" with a green dot, meaning it is talking successfully with the Spark driver.