Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Running pyspark (w/ anaconda ) in oozie via Hue

Running pyspark (w/ anaconda ) in oozie via Hue

Explorer

Hi,

I'm trying to run a simple python script on Oozie using Hue. I'm using anaconda parcels installed so I've also add in Cloudera manager, spark configuration (Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh)

if [ -z "${PYSPARK_PYTHON}" ]; then
export PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python
fi

When running the job, i've a python error

ImportError: No module named pandas.io.json

, meaning that the PYSPARK_PYTHON doesn't seems to take the one from anaconda.

 

I've tried to add an arguments with PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python on the spark action via hue, but doesn't seems to work.

 

If I run the scripts via CLI and spark-submit it works.

If I run other python scripts on Oozie via Hue (without packages from anaconda) it works.

 

What am I missing ? :/

 

 

2 REPLIES 2

Re: Running pyspark (w/ anaconda ) in oozie via Hue

New Contributor
I had the same problem. Solved it by adding python path in spark action properties in oozie workflow. Otherwise it picks default python
Highlighted

Re: Running pyspark (w/ anaconda ) in oozie via Hue

New Contributor
Don't have an account?
Coming from Hortonworks? Activate your account here