Support Questions

Find answers, ask questions, and share your expertise

What are the different methods to run Spark over Apache Hadoop?

New Contributor

There are various ways to run spark on top of hadoop
1. Spark-submit : one creates a jar then run the main from the command line using spark submit.
2. Spark shell : One can write queries on spark shell which is interactive.
3. Spark from hive. Hive queries can be translated to Spark and then executed on the cluster.
4. SparkServer2 : Its provide JDBC connection and one can execute spark using it.
5. Zeppelin/ Jupyter : These are notebooks where one can choose appropriate interpreter to run spark queries.
6. Oozie : One can provide the appropriate jar and workflow path , oozie has spark action which can spawn a spark job on the user behalf.

Super Collaborator

Well, Oozie just executes spark-submit

Also, this leaves out Spark runs "over" YARN as a resource manager, or "over" HDFS as a filesystem.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.