Created on 03-05-2016 12:32 AM
HDP 2.3.x cluster, whether it is a multi-node cluster or a single-node HDP Sandbox.
The Spark 1.6 Technical Preview is provided in RPM and DEB package formats. The following instructions assume RPM packaging:
wget -nv http://private-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.3.4.1-10/hdp.repo -O /etc/yum.repos.d/HDP-TP.repo For installing on Ubuntu use the following: http://private-repo-1.hortonworks.com/HDP/ubuntu12/2.x/updates/2.3.4.1-10/hdp.list
yum install <strong>spark</strong>_2_3_4_1_10-master -y
If you want to use pySpark, install it as follows and make sure that Python is installed on all nodes.
yum install <strong>spark</strong>_2_3_4_1_10-python -y
The RPM installer will also download core Hadoop dependencies. It will create “spark” as an OS user, and it will create the /user/spark directory in HDFS.
export JAVA_HOME=<path to JDK 1.8>
The Spark install creates the directory where Spark binaries are unpacked (/usr/hdp/2.3.4.1-10/spark). Set the SPARK_HOME variable to this directory:
export SPARK_HOME=/usr/hdp/2.3.4.1-10/spark/
<configuration><property><name>hive.metastore.uris</name> <strong><!--Make sure that <value> points to the Hive Metastore URI in your cluster --> </strong><value>thrift://sandbox.hortonworks.com:9083</value><description>URI for client to contact metastore server</description></property></configuration>
To test compute-intensive tasks in Spark, the Pi example calculates pi by “throwing darts” at a circle — it generates points in the unit square ((0,0) to (1,1)) and counts how many points fall within the unit circle within the square. The result approximates pi/4, which is used to estimate Pi.
cd $SPARK_HOME su spark
./bin/spark-submit --class org.apache.spark.examples.SparkPi--master yarn-client --num-executors 3--driver-memory 512m--executor-memory 512m--executor-cores 1 lib/spark-examples*.jar 10
Note: The Pi job should complete without any failure messages. It should produce output similar to the following. Note the value of pi near the end of the output.
15/12/1613:21:05 INFO DAGScheduler:Job0 finished: reduce at SparkPi.scala:36, took 4.313782 s <strong>Piis roughly 3.139492</strong>15/12/1613:21:05 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
Created on 03-14-2016 10:38 AM
Hi, I couldn't see Spark pre-installed on HDP 2.4. If so, how to enable it?
Created on 04-03-2016 12:25 PM
where to run these commands?