Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Runing pyspark with oozie - Unknown configuration problem

Runing pyspark with oozie - Unknown configuration problem

New Contributor



I'm trying to run a oozie example from command line but I'm getting the error: Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [2]


When I try to see the error logs in HUE, the error is Error getting logs at node6.agatha-cluster:8041


With the next codes, does anyone know what is happening? How can I get more accurate error logs?


Thank you!


  • (in local mode)


  • workflow.xml (stored in hdfs ${nameNode}/user/darguelles/${examplesRoot}/apps/pyspark)
<workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkPythonPi'>

    <start to='spark-node' />

    <action name='spark-node'>
        <spark xmlns="uri:oozie:spark-action:0.1">
            <spark-opts>--executor-memory 20G --num-executors 5</spark-opts>
        <ok to="end" />
        <error to="fail" />

    <kill name="fail">
        <message>Workflow failed, error message [${wf:errorMessage(wf:lastErrorNode())}]</message>

    <end name='end' />



  • (stored in hdfs ${nameNode}/user/darguelles/${examplesRoot}/apps/pyspark)
import sys
from random import random
from operator import add

from pyspark import SparkContext

if __name__ == "__main__":
        Usage: pi [partitions]
    sc = SparkContext(appName="Python-Spark-Pi")
    partitions = int(sys.argv[1]) if len(sys.argv) > 1 else 2
    n = 100000 * partitions

    def f(_):
        x = random() * 2 - 1
        y = random() * 2 - 1
        return 1 if x ** 2 + y ** 2 < 1 else 0

    count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
    print("Pi is roughly %f" % (4.0 * count / n))