Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

stream processing runtimes

avatar
Super Collaborator

Hi All,

most of the batch processing frameworks (MR, Spark) support a local mode and a distributed mode (standalone, yarn, mesos) of deployment and execution.

what about stream processing frameworks such as STORM, Spark-streaming? Do they manage the distributed mode on their own? is it even realistic to expect them to be work on YARN?

How to monitor a distributed spark streaming job? And do we need to specify master as yarn to make it distributed?

Thanks,

Avijeet

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hello,

Both storm & spark supports local mode.

In Storm you need to create a LocalCluster instance then you can submit your job onto that. You can find description and example in the links:

http://storm.apache.org/releases/1.0.2/Local-mode.html

https://github.com/apache/storm/blob/1.0.x-branch/examples/storm-starter/src/jvm/org/apache/storm/st...

Spark's approach on local mode is somewhat different. The allocation is controlled through the spark-master variable which can be set to local (or local[*], local[N] where N is a number). If local is specified executors will be started on your machine.

Both Storm and Spark has monitoring capabilities through a web interface. You can find details about them here:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_storm-component-guide/content/using-stor...

http://spark.apache.org/docs/latest/monitoring.html

Yarn is not a requirement but an option for distributed mode, both Spark & Storm is able to function on their own.

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

Hello,

Both storm & spark supports local mode.

In Storm you need to create a LocalCluster instance then you can submit your job onto that. You can find description and example in the links:

http://storm.apache.org/releases/1.0.2/Local-mode.html

https://github.com/apache/storm/blob/1.0.x-branch/examples/storm-starter/src/jvm/org/apache/storm/st...

Spark's approach on local mode is somewhat different. The allocation is controlled through the spark-master variable which can be set to local (or local[*], local[N] where N is a number). If local is specified executors will be started on your machine.

Both Storm and Spark has monitoring capabilities through a web interface. You can find details about them here:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_storm-component-guide/content/using-stor...

http://spark.apache.org/docs/latest/monitoring.html

Yarn is not a requirement but an option for distributed mode, both Spark & Storm is able to function on their own.

avatar
Super Collaborator

Thanks @Tibor Kiss - I am looking for more information around distributed mode, is there a name to the cluster managers in storm or spark stremaing.

avatar
Expert Contributor

In Storm's nomenclature 'nimbus' is the cluster manager:

http://storm.apache.org/releases/1.0.1/Setting-up-a-Storm-cluster.html

Spark calls the cluster manager as 'master':

http://spark.apache.org/docs/latest/spark-standalone.html

avatar
Super Collaborator

That's great @Tibor Kiss - I am trying to run a spark streaming - how do I say to run on standalone cluster mode?