Support Questions

avijeetd · ‎02-02-2017

Hi All,

most of the batch processing frameworks (MR, Spark) support a local mode and a distributed mode (standalone, yarn, mesos) of deployment and execution.

what about stream processing frameworks such as STORM, Spark-streaming? Do they manage the distributed mode on their own? is it even realistic to expect them to be work on YARN?

How to monitor a distributed spark streaming job? And do we need to specify master as yarn to make it distributed?

Thanks,

Avijeet

tkiss · ‎02-02-2017

Hello,

Both storm & spark supports local mode.

In Storm you need to create a LocalCluster instance then you can submit your job onto that. You can find description and example in the links:

http://storm.apache.org/releases/1.0.2/Local-mode.html

https://github.com/apache/storm/blob/1.0.x-branch/examples/storm-starter/src/jvm/org/apache/storm/st...

Spark's approach on local mode is somewhat different. The allocation is controlled through the spark-master variable which can be set to local (or local[*], local[N] where N is a number). If local is specified executors will be started on your machine.

Both Storm and Spark has monitoring capabilities through a web interface. You can find details about them here:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_storm-component-guide/content/using-stor...

http://spark.apache.org/docs/latest/monitoring.html

Yarn is not a requirement but an option for distributed mode, both Spark & Storm is able to function on their own.

View solution in original post

tkiss · ‎02-02-2017

Hello,

Both storm & spark supports local mode.

In Storm you need to create a LocalCluster instance then you can submit your job onto that. You can find description and example in the links:

http://storm.apache.org/releases/1.0.2/Local-mode.html

https://github.com/apache/storm/blob/1.0.x-branch/examples/storm-starter/src/jvm/org/apache/storm/st...

Spark's approach on local mode is somewhat different. The allocation is controlled through the spark-master variable which can be set to local (or local[*], local[N] where N is a number). If local is specified executors will be started on your machine.

Both Storm and Spark has monitoring capabilities through a web interface. You can find details about them here:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_storm-component-guide/content/using-stor...

http://spark.apache.org/docs/latest/monitoring.html

Yarn is not a requirement but an option for distributed mode, both Spark & Storm is able to function on their own.

avijeetd · ‎02-02-2017

Thanks @Tibor Kiss - I am looking for more information around distributed mode, is there a name to the cluster managers in storm or spark stremaing.

tkiss · ‎02-02-2017

In Storm's nomenclature 'nimbus' is the cluster manager:

http://storm.apache.org/releases/1.0.1/Setting-up-a-Storm-cluster.html

Spark calls the cluster manager as 'master':

http://spark.apache.org/docs/latest/spark-standalone.html

avijeetd · ‎02-03-2017

That's great @Tibor Kiss - I am trying to run a spark streaming - how do I say to run on standalone cluster mode?

Cloudera Community

Support Questions

stream processing runtimes