Support Questions

Find answers, ask questions, and share your expertise

What are the different methods to run Spark over Apache Hadoop?

New Contributor
 
2 REPLIES 2

There are various ways to run spark on top of hadoop
1. Spark-submit : one creates a jar then run the main from the command line using spark submit.
2. Spark shell : One can write queries on spark shell which is interactive.
3. Spark from hive. Hive queries can be translated to Spark and then executed on the cluster.
4. SparkServer2 : Its provide JDBC connection and one can execute spark using it.
5. Zeppelin/ Jupyter : These are notebooks where one can choose appropriate interpreter to run spark queries.
6. Oozie : One can provide the appropriate jar and workflow path , oozie has spark action which can spawn a spark job on the user behalf.

Super Collaborator

Well, Oozie just executes spark-submit

Also, this leaves out Spark runs "over" YARN as a resource manager, or "over" HDFS as a filesystem.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.