Community Articles

rhryniewicz · ‎03-05-2016

Requirements

HDP 2.3.x cluster, whether it is a multi-node cluster or a single-node HDP Sandbox.

Installing

The Spark 1.6 Technical Preview is provided in RPM and DEB package formats. The following instructions assume RPM packaging:

Download the Spark 1.6 RPM repository:

wget -nv http://private-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.3.4.1-10/hdp.repo -O /etc/yum.repos.d/HDP-TP.repo

For installing on Ubuntu use the following: 
http://private-repo-1.hortonworks.com/HDP/ubuntu12/2.x/updates/2.3.4.1-10/hdp.list

Install the Spark Package: Download the Spark 1.6 RPM (and pySpark, if desired) and set it up on your HDP 2.3 cluster:
```
yum install <strong>spark</strong>_2_3_4_1_10-master -y
```
If you want to use pySpark, install it as follows and make sure that Python is installed on all nodes.
```
yum install <strong>spark</strong>_2_3_4_1_10-python -y
```
The RPM installer will also download core Hadoop dependencies. It will create “spark” as an OS user, and it will create the /user/spark directory in HDFS.
Set JAVA_HOME and SPARK_HOME: Make sure that you set JAVA_HOME before you launch the Spark Shell or thrift server.
```
export JAVA_HOME=<path to JDK 1.8>
```
The Spark install creates the directory where Spark binaries are unpacked (/usr/hdp/2.3.4.1-10/spark). Set the SPARK_HOME variable to this directory:
```
export SPARK_HOME=/usr/hdp/2.3.4.1-10/spark/
```

Create hive-site in the Spark conf directory: As user root, create the file SPARK_HOME/conf/hive-site.xml. Edit the file to contain only the following configuration setting:

<configuration><property><name>hive.metastore.uris</name>
<strong><!--Make sure that <value> points to the Hive Metastore URI in your cluster -->
</strong><value>thrift://sandbox.hortonworks.com:9083</value><description>URI for client to contact metastore server</description></property></configuration>

Run the Spark Pi Example

To test compute-intensive tasks in Spark, the Pi example calculates pi by “throwing darts” at a circle — it generates points in the unit square ((0,0) to (1,1)) and counts how many points fall within the unit circle within the square. The result approximates pi/4, which is used to estimate Pi.

Change to your Spark directory and switch to the spark OS user:
```
cd $SPARK_HOME
su spark
```

Run the Spark Pi example in yarn-client mode:

./bin/spark-submit --class org.apache.spark.examples.SparkPi--master yarn-client --num-executors 3--driver-memory 512m--executor-memory 512m--executor-cores 1 lib/spark-examples*.jar 10

Note: The Pi job should complete without any failure messages. It should produce output similar to the following. Note the value of pi near the end of the output.

15/12/1613:21:05 INFO DAGScheduler:Job0 finished: reduce at SparkPi.scala:36, took 4.313782 s
<strong>Piis roughly 3.139492</strong>15/12/1613:21:05 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}

zwfalward · ‎03-14-2016

Hi, I couldn't see Spark pre-installed on HDP 2.4. If so, how to enable it?

vrsanaidu · ‎04-03-2016

where to run these commands?

Cloudera Community

Community Articles

Installing Spark 1.6 on HDP 2.3.x

Apache Spark

Hortonworks Data Platform (HDP)

Requirements

Installing

Run the Spark Pi Example

Re: Installing Spark 1.6 on HDP 2.3.x

Re: Installing Spark 1.6 on HDP 2.3.x