Reply
Highlighted
New Contributor
Posts: 2
Registered: ‎01-11-2018

Runing pyspark with oozie - Unknown configuration problem

[ Edited ]

Hello,

 

I'm trying to run a oozie example from command line but I'm getting the error: Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [2]

 

When I try to see the error logs in HUE, the error is Error getting logs at node6.agatha-cluster:8041

 

With the next codes, does anyone know what is happening? How can I get more accurate error logs?

 

Thank you!

 

  • job.properties (in local mode)
nameNode=hdfs://node10:8020
jobTracker=node10:8032
queueName=default
examplesRoot=oozie_examples
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/darguelles/${examplesRoot}/apps/pyspark
master=yarn-client

 

  • workflow.xml (stored in hdfs ${nameNode}/user/darguelles/${examplesRoot}/apps/pyspark)
<workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkPythonPi'>

    <start to='spark-node' />

    <action name='spark-node'>
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <master>${master}</master>
            <name>Python-Spark-Pi</name>
            <jar>pi.py</jar> 
            <spark-opts>--executor-memory 20G --num-executors 5</spark-opts>
            <arg>value=10</arg>
        </spark>
        <ok to="end" />
        <error to="fail" />
    </action>

    <kill name="fail">
        <message>Workflow failed, error message [${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>

    <end name='end' />

</workflow-app>

 

  • pi.py (stored in hdfs ${nameNode}/user/darguelles/${examplesRoot}/apps/pyspark)
import sys
from random import random
from operator import add

from pyspark import SparkContext


if __name__ == "__main__":
    """
        Usage: pi [partitions]
    """
    sc = SparkContext(appName="Python-Spark-Pi")
    partitions = int(sys.argv[1]) if len(sys.argv) > 1 else 2
    n = 100000 * partitions

    def f(_):
        x = random() * 2 - 1
        y = random() * 2 - 1
        return 1 if x ** 2 + y ** 2 < 1 else 0

    count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
    print("Pi is roughly %f" % (4.0 * count / n))

sc.stop()
Announcements