Support Questions

Find answers, ask questions, and share your expertise

Submitting a spark sql job

avatar

Does anyone have an example of how to submit a sparksql job to a cluster. I am comfortable using spark-shell, but i want to understand how to productianize a sparksql job

6 REPLIES 6

avatar
Super Guru

this is a good reference to create and submit spark-sql job https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter..., hope it will help

avatar

See https://github.com/vinayshukla/SparkDemo1 for an example for a Spark app with maven packaging & build against HDP. You can change the Spark version in the pom to the Spark version you want to use.

You can submit the Spark App with spark-submit to run on your cluster.

avatar
Super Guru

@Artem Ervits how this can be an answer? its doesn't provide any info on spark sql.

avatar
Master Mentor

Fair enough, I unaccepted the answer.

avatar

it is not even related to the question

avatar

Here is some sample PySpark working code using the HiveContext: run the code below via yarn using the command:

$SPARK_HOME/bin/spark-submit --queue <if you have queue's setup> --master yarn-client <name_of_my_python_script>

code:

from pyspark.sql import *

from pyspark import SparkConf, SparkContext, SQLContext

from pyspark.sql import HiveContext

conf = (SparkConf()

.setAppName("bmathew_ucs_data_profiling")

.set("spark.executor.instances", "40")

.set("spark.executor.cores", 4)

.set("spark.executor.memory", "5g"))

sc = SparkContext(conf = conf)

sqlContext = HiveContext(sc)

cmd = "hadoop fs -rm -r /tmp/spatton_cdc_test1"

os.system(cmd)

cmd = "hadoop fs -mkdir /tmp/spatton_cdc_test1"

os.system(cmd)

cmd = "hadoop fs -chmod 777 /tmp/spatton_cdc_test1"

os.system(cmd)

sqlContext.sql("DROP TABLE IF EXISTS spatton_cdc_test1")

sql = """

CREATE EXTERNAL TABLE spatton_cdc_test1 ( widget_key bigint ,widget_content_key bigint ,field_name string ,field_value string ) LOCATION '/tmp/spatton_cdc_test1'

"""

sqlContext.sql(sql)

sql = """

INSERT INTO TABLE spatton_cdc_test1 SELECT w.widget_key, wc.widget_content_key, new.field_name, new.field_value FROM bloodhound_uniq_widget_content new JOIN bloodhound_widget w ON w.name = new.widget_name JOIN bloodhound_widget_content wc ON wc.widget_key = w.widget_key AND wc.name = new.name AND wc.type = new.type AND wc.position = new.position

"""

sqlContext.sql(sql)