Support Questions
Find answers, ask questions, and share your expertise

What are the different methods to run Spark over Apache Hadoop?

What are the different methods to run Spark over Apache Hadoop?

New Contributor
 
2 REPLIES 2
Highlighted

Re: What are the different methods to run Spark over Apache Hadoop?

There are various ways to run spark on top of hadoop
1. Spark-submit : one creates a jar then run the main from the command line using spark submit.
2. Spark shell : One can write queries on spark shell which is interactive.
3. Spark from hive. Hive queries can be translated to Spark and then executed on the cluster.
4. SparkServer2 : Its provide JDBC connection and one can execute spark using it.
5. Zeppelin/ Jupyter : These are notebooks where one can choose appropriate interpreter to run spark queries.
6. Oozie : One can provide the appropriate jar and workflow path , oozie has spark action which can spawn a spark job on the user behalf.

Highlighted

Re: What are the different methods to run Spark over Apache Hadoop?

Super Collaborator

Well, Oozie just executes spark-submit

Also, this leaves out Spark runs "over" YARN as a resource manager, or "over" HDFS as a filesystem.