Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

run a python script containing commands spark

run a python script containing commands spark

Rising Star

Hello, I want to know is how can I run a python script that contains commands spark ?

Here is my python script that I would run into a python environment :

#!/usr/bin/python2.7

from pyspark.sql import HiveContext

from pyspark import SparkContext

from pandas.DataFrame.ix import DataFrame as df

hive_context = HiveContext(sc)

qvol1 = hive_context.table("table")

qvol2 = hive_context.table("table")

qvol1.registerTempTable("qvol1_temp")

qvol2.registerTempTable("qvol2_temp")

df=hive_context.sql("request")

df.show()

7 REPLIES 7

Re: run a python script containing commands spark

Rising Star

You can simply use spark-submit, which is in the bin folder of your spark-client installation. Here you can find the documentation for it: http://spark.apache.org/docs/latest/submitting-applications.html

Re: run a python script containing commands spark

Rising Star

Thank you. I managed to run it. Except that my file is local and when I specify the path of a file on the cluster, I receive an error:

bash-4.1$ spark-submit --master yarn-client --queue DES hdfs:///dev/datalake/app/des/dev/script/return.py Error: Only local python files are supported: Parsed arguments: master yarn-client deployMode client executorMemory null executorCores null totalExecutorCores null propertiesFile /usr/hdp/current/spark-client/conf/spark-defaults.conf driverMemory null driverCores null driverExtraClassPath /usr/hdp/current/share/lzo/0.6.0/lib/hadoop-lzo-0.6.0.jar:/usr/local/jdk-hadoop/ojdbc7.jar:/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar:/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/metrics-core-2.2.0.jar driverExtraLibraryPath /usr/hdp/current/share/lzo/0.6.0/lib/native/Linux-amd64-64/ driverExtraJavaOptions null supervise false queue DES numExecutors null files null pyFiles null archives null mainClass null primaryResource hdfs:///dev/datalake/app/des/dev/script/return.py name return.py childArgs [] jars null packages null packagesExclusions null repositories null verbose false

Highlighted

Re: run a python script containing commands spark

Rising Star

Which is the problem using a local file? Indeed is what you have to do... There is no reason to specify the path of the file on hdfs.

Re: run a python script containing commands spark

@alain TSAFACK

I think you need the --files option to pass the python script to all executor instances. So for example:

./bin/spark-submit --class my.main.Class \
    --master yarn-cluster \
    --jars my-other-jar.jar,my-other-other-jar.jar
    --files return.py
    my-main-jar.jar
    app_arg1 app_arg2

Re: run a python script containing commands spark

Rising Star

Hello Paul Hargis

Here is the command that I run with the parameter --files but it generates me an error:

bash-4.1$ spark-submit --master yarn-cluster --queue DES --files hdfs://dev/datalake/app/des/dev/script/return.py

Error: Must specify a primary resource (JAR or Python or R file) Run with --help for usage help or --verbose for debug output

My cordial Thanks

Re: run a python script containing commands spark

I think you want to unit test this python script to do so, just lunch pyspark shell which will give you python repl where you can run each line one by one to test it.

Re: run a python script containing commands spark

Rising Star

Thank you.

But I've already done this step and I needed to handle multiple files. Currently this is solved thank you

Don't have an account?
Coming from Hortonworks? Activate your account here