Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to check a correct install of spark? ( Whether spark is working as expected!)

avatar
New Member
 
1 ACCEPTED SOLUTION

avatar
Rising Star
# Check Commands
# --------------

# Spark Scala
# -----------

# Optionally export Spark Home
export SPARK_HOME=/usr/hdp/current/spark-client

# Spark submit example in local mode
spark-submit --class org.apache.spark.examples.SparkPi --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10

# Spark submit example in client mode
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10

# Spark submit example in cluster mode
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10


# Spark shell with yarn client
spark-shell --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1


# Pyspark
# -------


# Optionally export Hadoop COnf and PySpark Python
export HADOOP_CONF_DIR=/etc/hadoop/conf
export PYSPARK_PYTHON=/opath/to/bin/python


# PySpark submit example in local mode
spark-submit --verbose /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100


# PySpark submit example in client mode
spark-submit --verbose --master yarn-client /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100


# PySpark submit example in cluster mode
spark-submit --verbose --master yarn-cluster /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100


# PySpark shell with yarn client
pyspark --master yarn-client


@jigar.patel

View solution in original post

4 REPLIES 4

avatar

You may want to launch a spark-shell, check the version 'sc.version', check the instantiation of contexts/session and run some SQL queries.

avatar
Super Collaborator

A good way to sanity check Spark is to start Spark shell with YARN (spark-shell --master yarn) and run something like this:

val x = sc.textFile("some hdfs path to a text file or directory of text files")

x.count()

This will basically do a distributed line count.

If that looks good, another sanity check is for Hive integration. Run spark-sql (spark-sql --master yarn) and try to query a table that you know can be queried via Hive.

avatar
Super Collaborator

The Spark version will be displayed in the console log output...

avatar
Rising Star
# Check Commands
# --------------

# Spark Scala
# -----------

# Optionally export Spark Home
export SPARK_HOME=/usr/hdp/current/spark-client

# Spark submit example in local mode
spark-submit --class org.apache.spark.examples.SparkPi --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10

# Spark submit example in client mode
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10

# Spark submit example in cluster mode
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10


# Spark shell with yarn client
spark-shell --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1


# Pyspark
# -------


# Optionally export Hadoop COnf and PySpark Python
export HADOOP_CONF_DIR=/etc/hadoop/conf
export PYSPARK_PYTHON=/opath/to/bin/python


# PySpark submit example in local mode
spark-submit --verbose /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100


# PySpark submit example in client mode
spark-submit --verbose --master yarn-client /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100


# PySpark submit example in cluster mode
spark-submit --verbose --master yarn-cluster /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100


# PySpark shell with yarn client
pyspark --master yarn-client


@jigar.patel