Support Questions

jigar_patel · ‎09-05-2016

anandi · ‎09-05-2016

# Check Commands
# --------------

# Spark Scala
# -----------

# Optionally export Spark Home
export SPARK_HOME=/usr/hdp/current/spark-client

# Spark submit example in local mode
spark-submit --class org.apache.spark.examples.SparkPi --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10

# Spark submit example in client mode
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10

# Spark submit example in cluster mode
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10


# Spark shell with yarn client
spark-shell --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1


# Pyspark
# -------


# Optionally export Hadoop COnf and PySpark Python
export HADOOP_CONF_DIR=/etc/hadoop/conf
export PYSPARK_PYTHON=/opath/to/bin/python


# PySpark submit example in local mode
spark-submit --verbose /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100


# PySpark submit example in client mode
spark-submit --verbose --master yarn-client /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100


# PySpark submit example in cluster mode
spark-submit --verbose --master yarn-cluster /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100


# PySpark shell with yarn client
pyspark --master yarn-client

@jigar.patel

View solution in original post

pvillard · ‎09-05-2016

You may want to launch a spark-shell, check the version 'sc.version', check the instantiation of contexts/session and run some SQL queries.

clukasik · ‎09-05-2016

A good way to sanity check Spark is to start Spark shell with YARN (spark-shell --master yarn) and run something like this:

val x = sc.textFile("some hdfs path to a text file or directory of text files")

x.count()

This will basically do a distributed line count.

If that looks good, another sanity check is for Hive integration. Run spark-sql (spark-sql --master yarn) and try to query a table that you know can be queried via Hive.

clukasik · ‎09-05-2016

The Spark version will be displayed in the console log output...

anandi · ‎09-05-2016

# Check Commands
# --------------

# Spark Scala
# -----------

# Optionally export Spark Home
export SPARK_HOME=/usr/hdp/current/spark-client

# Spark submit example in local mode
spark-submit --class org.apache.spark.examples.SparkPi --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10

# Spark submit example in client mode
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10

# Spark submit example in cluster mode
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10


# Spark shell with yarn client
spark-shell --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1


# Pyspark
# -------


# Optionally export Hadoop COnf and PySpark Python
export HADOOP_CONF_DIR=/etc/hadoop/conf
export PYSPARK_PYTHON=/opath/to/bin/python


# PySpark submit example in local mode
spark-submit --verbose /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100


# PySpark submit example in client mode
spark-submit --verbose --master yarn-client /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100


# PySpark submit example in cluster mode
spark-submit --verbose --master yarn-cluster /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100


# PySpark shell with yarn client
pyspark --master yarn-client

@jigar.patel

Cloudera Community

Support Questions

How to check a correct install of spark? ( Whether spark is working as expected!)

Working with Iceberg in CDE Spark Sessions

Working with CDE Spark Job Parameters in Cloudera ...

Spark in CML: Recommendations for using Spark in C...

Data Integrity check using Spark JDBC with Encrypt...

Cloudera Data Engineering Spark Job with Python Wh...

How to install and run Spark 2.0 on HDP 2.5 Sandbo...

Spark Remote Debugging

spark com.databricks.spark.csv doesnt work

Installing and Exploring Spark 2.0 with Jupyter No...

Spark Structured Streaming example with CDE