Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

stream processing runtimes

Expert Contributor

Hi All,

most of the batch processing frameworks (MR, Spark) support a local mode and a distributed mode (standalone, yarn, mesos) of deployment and execution.

what about stream processing frameworks such as STORM, Spark-streaming? Do they manage the distributed mode on their own? is it even realistic to expect them to be work on YARN?

How to monitor a distributed spark streaming job? And do we need to specify master as yarn to make it distributed?

Thanks,

Avijeet

1 ACCEPTED SOLUTION

Contributor

Hello,

Both storm & spark supports local mode.

In Storm you need to create a LocalCluster instance then you can submit your job onto that. You can find description and example in the links:

http://storm.apache.org/releases/1.0.2/Local-mode.html

https://github.com/apache/storm/blob/1.0.x-branch/examples/storm-starter/src/jvm/org/apache/storm/st...

Spark's approach on local mode is somewhat different. The allocation is controlled through the spark-master variable which can be set to local (or local[*], local[N] where N is a number). If local is specified executors will be started on your machine.

Both Storm and Spark has monitoring capabilities through a web interface. You can find details about them here:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_storm-component-guide/content/using-stor...

http://spark.apache.org/docs/latest/monitoring.html

Yarn is not a requirement but an option for distributed mode, both Spark & Storm is able to function on their own.

View solution in original post

4 REPLIES 4

Contributor

Hello,

Both storm & spark supports local mode.

In Storm you need to create a LocalCluster instance then you can submit your job onto that. You can find description and example in the links:

http://storm.apache.org/releases/1.0.2/Local-mode.html

https://github.com/apache/storm/blob/1.0.x-branch/examples/storm-starter/src/jvm/org/apache/storm/st...

Spark's approach on local mode is somewhat different. The allocation is controlled through the spark-master variable which can be set to local (or local[*], local[N] where N is a number). If local is specified executors will be started on your machine.

Both Storm and Spark has monitoring capabilities through a web interface. You can find details about them here:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_storm-component-guide/content/using-stor...

http://spark.apache.org/docs/latest/monitoring.html

Yarn is not a requirement but an option for distributed mode, both Spark & Storm is able to function on their own.

Expert Contributor

Thanks @Tibor Kiss - I am looking for more information around distributed mode, is there a name to the cluster managers in storm or spark stremaing.

Contributor

In Storm's nomenclature 'nimbus' is the cluster manager:

http://storm.apache.org/releases/1.0.1/Setting-up-a-Storm-cluster.html

Spark calls the cluster manager as 'master':

http://spark.apache.org/docs/latest/spark-standalone.html

Expert Contributor

That's great @Tibor Kiss - I am trying to run a spark streaming - how do I say to run on standalone cluster mode?