Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Running pyspark (w/ anaconda ) in oozie via Hue

Highlighted

Running pyspark (w/ anaconda ) in oozie via Hue

Explorer

Hi,

I'm trying to run a simple python script on Oozie using Hue. I'm using anaconda parcels installed so I've also add in Cloudera manager, spark configuration (Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh)

if [ -z "${PYSPARK_PYTHON}" ]; then
export PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python
fi

When running the job, i've a python error

ImportError: No module named pandas.io.json

, meaning that the PYSPARK_PYTHON doesn't seems to take the one from anaconda.

 

I've tried to add an arguments with PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python on the spark action via hue, but doesn't seems to work.

 

If I run the scripts via CLI and spark-submit it works.

If I run other python scripts on Oozie via Hue (without packages from anaconda) it works.

 

What am I missing ? :/

 

 

2 REPLIES 2

Re: Running pyspark (w/ anaconda ) in oozie via Hue

New Contributor
I had the same problem. Solved it by adding python path in spark action properties in oozie workflow. Otherwise it picks default python

Re: Running pyspark (w/ anaconda ) in oozie via Hue

New Contributor