Created 05-17-2016 05:07 AM
Does anyone have an example of how to submit a sparksql job to a cluster. I am comfortable using spark-shell, but i want to understand how to productianize a sparksql job
Created 05-17-2016 05:24 AM
this is a good reference to create and submit spark-sql job https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter..., hope it will help
Created 05-17-2016 06:44 PM
See https://github.com/vinayshukla/SparkDemo1 for an example for a Spark app with maven packaging & build against HDP. You can change the Spark version in the pom to the Spark version you want to use.
You can submit the Spark App with spark-submit to run on your cluster.
Created 05-18-2016 11:08 AM
@Artem Ervits how this can be an answer? its doesn't provide any info on spark sql.
Created 05-18-2016 11:18 AM
Fair enough, I unaccepted the answer.
Created 01-01-2017 02:00 PM
it is not even related to the question
Created 05-23-2016 11:15 PM
Here is some sample PySpark working code using the HiveContext: run the code below via yarn using the command:
$SPARK_HOME/bin/spark-submit --queue <if you have queue's setup> --master yarn-client <name_of_my_python_script>
code:
from pyspark.sql import *
from pyspark import SparkConf, SparkContext, SQLContext
from pyspark.sql import HiveContext
conf = (SparkConf()
.setAppName("bmathew_ucs_data_profiling")
.set("spark.executor.instances", "40")
.set("spark.executor.cores", 4)
.set("spark.executor.memory", "5g"))
sc = SparkContext(conf = conf)
sqlContext = HiveContext(sc)
cmd = "hadoop fs -rm -r /tmp/spatton_cdc_test1"
os.system(cmd)
cmd = "hadoop fs -mkdir /tmp/spatton_cdc_test1"
os.system(cmd)
cmd = "hadoop fs -chmod 777 /tmp/spatton_cdc_test1"
os.system(cmd)
sqlContext.sql("DROP TABLE IF EXISTS spatton_cdc_test1")
sql = """
CREATE EXTERNAL TABLE spatton_cdc_test1 ( widget_key bigint ,widget_content_key bigint ,field_name string ,field_value string ) LOCATION '/tmp/spatton_cdc_test1'
"""
sqlContext.sql(sql)
sql = """
INSERT INTO TABLE spatton_cdc_test1 SELECT w.widget_key, wc.widget_content_key, new.field_name, new.field_value FROM bloodhound_uniq_widget_content new JOIN bloodhound_widget w ON w.name = new.widget_name JOIN bloodhound_widget_content wc ON wc.widget_key = w.widget_key AND wc.name = new.name AND wc.type = new.type AND wc.position = new.position
"""
sqlContext.sql(sql)