Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Basic Question: Install Spark on HDP 2.3 using Ambari

avatar
Contributor

I have a 9-node (6 slaves, 2 masters and 1 edge node) cluster of HDP 2.3 with Ambari running. Currently only HDFS, YARN, Zookeeper, and Ambari metrics are running.

I'd like to install Spark. When I did an install of Spark 1.4.1 via Ambari, it installed a Spark history server on one node and spark client on 2 nodes. I don't see spark on the other nodes. Do I have to install Spark client on every node and set the master and slaves configuration and start spark manually?

I am not connected to the Internet and there are no proxy servers.

1 ACCEPTED SOLUTION

avatar
Master Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
7 REPLIES 7

avatar
Master Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Contributor
@ Neeraj Sabharwal My question is will Ambari configure the hosts (spark master and spark client) or should I do that?

avatar
Master Mentor

@Rahul Tikekar Good question..Please see this doc and http://spark.apache.org/docs/latest/running-on-yarn.html

When you submit spark job then you define master "the --master parameter is yarn."

In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

avatar
Master Mentor

@Rahul Tikekar

Go to /usr/hdp/current

ls spark*

You can all the details related to spark

avatar
Master Mentor

install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster. @Rahul Tikekar

avatar
Contributor
@Neeraj Sabharwal I guess my confusion lies with where the Spark binaries are installed. And that in turn may have to do with the execution of Spark: there are three modes: local, standalone, and YARN Cluster. I guess the way it is set up now, I can run it as a YARN cluster or local but not in standalone mode. If I want to run it in a standalone mode, I will need the Sparl clients on all nodes and will have to configure the slaves files, etc. Am I correct in this understanding? Thanks.

avatar
New Contributor

i think hdp spark does not support standalone cluster mode, only yarn mode. Am i right?

,

i think so, hdp spark does not support standalone mode. right? only yarn mode.