Support Questions
Find answers, ask questions, and share your expertise

Basic Question: Install Spark on HDP 2.3 using Ambari

I have a 9-node (6 slaves, 2 masters and 1 edge node) cluster of HDP 2.3 with Ambari running. Currently only HDFS, YARN, Zookeeper, and Ambari metrics are running.

I'd like to install Spark. When I did an install of Spark 1.4.1 via Ambari, it installed a Spark history server on one node and spark client on 2 nodes. I don't see spark on the other nodes. Do I have to install Spark client on every node and set the master and slaves configuration and start spark manually?

I am not connected to the Internet and there are no proxy servers.

1 ACCEPTED SOLUTION

@Rahul Tikekar

If you want spark client in all the nodes then you can install it using ambari.

You can start spark thrift and History server from the ambari.

You don't have to do anyting if you are using ambari to manager spark.

View solution in original post

7 REPLIES 7

@Rahul Tikekar

If you want spark client in all the nodes then you can install it using ambari.

You can start spark thrift and History server from the ambari.

You don't have to do anyting if you are using ambari to manager spark.

@ Neeraj Sabharwal My question is will Ambari configure the hosts (spark master and spark client) or should I do that?

@Rahul Tikekar Good question..Please see this doc and http://spark.apache.org/docs/latest/running-on-yarn.html

When you submit spark job then you define master "the --master parameter is yarn."

In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

@Rahul Tikekar

Go to /usr/hdp/current

ls spark*

You can all the details related to spark

install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster. @Rahul Tikekar

@Neeraj Sabharwal I guess my confusion lies with where the Spark binaries are installed. And that in turn may have to do with the execution of Spark: there are three modes: local, standalone, and YARN Cluster. I guess the way it is set up now, I can run it as a YARN cluster or local but not in standalone mode. If I want to run it in a standalone mode, I will need the Sparl clients on all nodes and will have to configure the slaves files, etc. Am I correct in this understanding? Thanks.

New Contributor

i think hdp spark does not support standalone cluster mode, only yarn mode. Am i right?

,

i think so, hdp spark does not support standalone mode. right? only yarn mode.

; ;