Support Questions

ebeb · ‎10-27-2017

Hi,

I am getting an error while running a python script using shell action in Hue/oozie. My workflow xml is given below. Any ideas? Thanks.

from pyspark import SparkContext
ImportError: No module named pyspark
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

--------------------------------------------------------------------------------

<workflow-app name="My Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="shell-8cca"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="shell-8cca">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>oozie.launcher.mapred.child.env</name>
<value>PYTHONPATH=/usr/bin/python</value>
</property>
<property>
<name>oozie.launcher.mapred.child.env</name>
<value>PYSPARK_PYTHON=/usr/bin/pyspark</value>
</property>
</configuration>
<exec>shexample7.sh</exec>
<env-var>PYTHONPATH=/usr/bin/python</env-var>
<env-var>PYSPARK_PYTHON=/usr/bin/pyspark</env-var>
<file>/user/admin/shexample7.sh#shexample7.sh</file>
<file>/user/admin/pyexample.py#pyexample.py</file>
<capture-output/>
</shell>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>

EricL · ‎10-29-2017

Can you please share the content of shexample7.sh? I would like to see how you launch the spark job in shell script.

ebeb · ‎10-29-2017

The good news is even though the shell script didnt work, I was able to run the same python script using Spark Hivecontext using the Spark action in Hue->Workflow instead of Shell action.

The shell script is shexample7.sh:

-------------------------------------------------

#!/usr/bin/env bash

export PYTHONPATH=/usr/bin/python
export PYSPARK_PYTHON=/usr/bin/python

echo "starting..."

/usr/bin/spark-submit --master yarn-cluster pyexample.py

The python script is pyexample.py:

-----------------------------------------------

#!/usr/bin/env python

from pyspark import SparkContext
from pyspark.sql import HiveContext

sc = SparkContext("local", "pySpark Hive App")
# Create a Hive Context
hive_context = HiveContext(sc)

print "Reading Hive table..."

mytbl = hive_context.sql("SELECT * FROM xyzdb.testdata1")

print "Registering DataFrame as a table..."
mytbl.show() # Show first rows of dataframe
mytbl.printSchema()

The python job successfully displays the data but somehow the final status comes back as KILLED even though the python script ran and got back data from hive in stdout.

Cloudera Community

Support Questions

ImportError: No module named pyspark from oozie job in hue