Support Questions

Find answers, ask questions, and share your expertise

Spark Python job via Oozie on the Sandbox HDP2.5 (Docker instance) hangs at 'Initial job has not accepted any resources'

New Contributor

I have HDP2.5 sandbox running on my McBook as Docker container. I am trying to run a simple Python script (calculating Pi) on Spark via Oozie. For this I created a Workflow with one Spark step.

The result is that the Spark job keeps hanging while logging:

WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

I am pretty sure the necessary resources are available as I assigned 8GB and 4 cores to the Docker.

If I run the Spark Pi example from the command line it works fine.

I tried to cut down the necessary resources for the job by defining minimum resources in the job config:

<spark xmlns="uri:oozie:spark-action:0.2">
  <job-tracker>sandbox.hortonworks.com:8050</job-tracker>
  <name-node>hdfs://sandbox.hortonworks.com:8020</name-node>
  <master>yarn-cluster</master>
  <name>SparkPi</name>
  <jar>pi.py</jar>
  <spark-opts>--driver-memory 1g --executor-memory 128m --num-executors 2 --executor-cores 1</spark-opts>
  <arg>10</arg>
  <file>/palma/pi.py</file>
  <configuration />
</spark>

I also set the dynamicAllocation to false (just to make sure):

conf.set('spark.dynamicAllocation.enabled','false')

But although the spark job is started it keeps hanging/waiting while logging:

WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Can somebody point me in the direction what might be missing/wrong in my setup?

For completeness here is the Python code I try to run:

	from __future__ import print_function
	import sys
	from random import random
	from operator import add
	from pyspark import SparkContext,SparkConf
	print("Started calculations")
	"""
	    Usage: pi [partitions]
	"""
	conf = SparkConf()
	conf.setAppName('PythonPi')
	conf.set('spark.dynamicAllocation.enabled','false')
	sc = SparkContext(conf=conf)
	partitions = int(sys.argv[1]) if len(sys.argv) > 1 else 2
	n = 100 * partitions
	print("n = %d" % n )
	def f(_):
	    x = random() * 2 - 1
	    y = random() * 2 - 1
	    return 1 if x ** 2 + y ** 2 < 1 else 0
	count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
	print("Pi is roughly %f" % (4.0 * count / n))
	sc.stop()
1 REPLY 1

New Contributor

Did you solve this problem?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.