Support Questions

Find answers, ask questions, and share your expertise

Basic Question: Install Spark on HDP 2.3 using Ambari

avatar
Contributor

I have a 9-node (6 slaves, 2 masters and 1 edge node) cluster of HDP 2.3 with Ambari running. Currently only HDFS, YARN, Zookeeper, and Ambari metrics are running.

I'd like to install Spark. When I did an install of Spark 1.4.1 via Ambari, it installed a Spark history server on one node and spark client on 2 nodes. I don't see spark on the other nodes. Do I have to install Spark client on every node and set the master and slaves configuration and start spark manually?

I am not connected to the Internet and there are no proxy servers.

1 ACCEPTED SOLUTION

avatar
Master Mentor
@Rahul Tikekar

If you want spark client in all the nodes then you can install it using ambari.

You can start spark thrift and History server from the ambari.

You don't have to do anyting if you are using ambari to manager spark.

View solution in original post

7 REPLIES 7

avatar
Master Mentor
@Rahul Tikekar

If you want spark client in all the nodes then you can install it using ambari.

You can start spark thrift and History server from the ambari.

You don't have to do anyting if you are using ambari to manager spark.

avatar
Contributor
@ Neeraj Sabharwal My question is will Ambari configure the hosts (spark master and spark client) or should I do that?

avatar
Master Mentor

@Rahul Tikekar Good question..Please see this doc and http://spark.apache.org/docs/latest/running-on-yarn.html

When you submit spark job then you define master "the --master parameter is yarn."

In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

avatar
Master Mentor

@Rahul Tikekar

Go to /usr/hdp/current

ls spark*

You can all the details related to spark

avatar
Master Mentor

install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster. @Rahul Tikekar

avatar
Contributor
@Neeraj Sabharwal I guess my confusion lies with where the Spark binaries are installed. And that in turn may have to do with the execution of Spark: there are three modes: local, standalone, and YARN Cluster. I guess the way it is set up now, I can run it as a YARN cluster or local but not in standalone mode. If I want to run it in a standalone mode, I will need the Sparl clients on all nodes and will have to configure the slaves files, etc. Am I correct in this understanding? Thanks.

avatar
New Contributor

i think hdp spark does not support standalone cluster mode, only yarn mode. Am i right?

,

i think so, hdp spark does not support standalone mode. right? only yarn mode.