I have HDP2.5 sandbox running on my McBook as Docker container. I am trying to run a simple Python script (calculating Pi) on Spark via Oozie. For this I created a Workflow with one Spark step.
The result is that the Spark job keeps hanging while logging:
WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I am pretty sure the necessary resources are available as I assigned 8GB and 4 cores to the Docker.
If I run the Spark Pi example from the command line it works fine.
I tried to cut down the necessary resources for the job by defining minimum resources in the job config:
<spark-opts>--driver-memory 1g --executor-memory 128m --num-executors 2 --executor-cores 1</spark-opts>
I also set the dynamicAllocation to false (just to make sure):
But although the spark job is started it keeps hanging/waiting while logging:
Can somebody point me in the direction what might be missing/wrong in my setup?
For completeness here is the Python code I try to run:
from __future__ import print_function
from random import random
from operator import add
from pyspark import SparkContext,SparkConf
Usage: pi [partitions]
conf = SparkConf()
sc = SparkContext(conf=conf)
partitions = int(sys.argv) if len(sys.argv) > 1 else 2
n = 100 * partitions
print("n = %d" % n )
x = random() * 2 - 1
y = random() * 2 - 1
return 1 if x ** 2 + y ** 2 < 1 else 0
count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
print("Pi is roughly %f" % (4.0 * count / n))
Did you solve this problem?