Created 02-02-2017 10:42 AM
Hi All,
most of the batch processing frameworks (MR, Spark) support a local mode and a distributed mode (standalone, yarn, mesos) of deployment and execution.
what about stream processing frameworks such as STORM, Spark-streaming? Do they manage the distributed mode on their own? is it even realistic to expect them to be work on YARN?
How to monitor a distributed spark streaming job? And do we need to specify master as yarn to make it distributed?
Thanks,
Avijeet
Created 02-02-2017 11:39 AM
Hello,
Both storm & spark supports local mode.
In Storm you need to create a LocalCluster instance then you can submit your job onto that. You can find description and example in the links:
http://storm.apache.org/releases/1.0.2/Local-mode.html
Spark's approach on local mode is somewhat different. The allocation is controlled through the spark-master variable which can be set to local (or local[*], local[N] where N is a number). If local is specified executors will be started on your machine.
Both Storm and Spark has monitoring capabilities through a web interface. You can find details about them here:
http://spark.apache.org/docs/latest/monitoring.html
Yarn is not a requirement but an option for distributed mode, both Spark & Storm is able to function on their own.
Created 02-02-2017 11:39 AM
Hello,
Both storm & spark supports local mode.
In Storm you need to create a LocalCluster instance then you can submit your job onto that. You can find description and example in the links:
http://storm.apache.org/releases/1.0.2/Local-mode.html
Spark's approach on local mode is somewhat different. The allocation is controlled through the spark-master variable which can be set to local (or local[*], local[N] where N is a number). If local is specified executors will be started on your machine.
Both Storm and Spark has monitoring capabilities through a web interface. You can find details about them here:
http://spark.apache.org/docs/latest/monitoring.html
Yarn is not a requirement but an option for distributed mode, both Spark & Storm is able to function on their own.
Created 02-02-2017 11:58 AM
Thanks @Tibor Kiss - I am looking for more information around distributed mode, is there a name to the cluster managers in storm or spark stremaing.
Created 02-02-2017 12:15 PM
In Storm's nomenclature 'nimbus' is the cluster manager:
http://storm.apache.org/releases/1.0.1/Setting-up-a-Storm-cluster.html
Spark calls the cluster manager as 'master':
Created 02-03-2017 04:36 AM
That's great @Tibor Kiss - I am trying to run a spark streaming - how do I say to run on standalone cluster mode?