- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Multi-threading in Hadoop
- Labels:
-
Apache Hadoop
-
Apache Spark
Created ‎11-16-2016 01:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
I have an idea about Multi-threading in general but not sure how it is used in Hadoop.
Based on my knowledge, Yarn is responsible for managing/controlling Spark/Mapreduce job resources, can't think of Multi-threading here. Not sure whether it can be used anywhere else in Hadoop Eco System.
I would appreciate if anybody could provide some information on this.
Many thanks,
Created ‎11-16-2016 10:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Multithreading Programming Model and MapReduce Programming Model are based on fundamentally different principles and both are meant to solve different kinds of data storage and processing problems. Multithreading is based on Parallelization of Processing where as Hadoop takes power by Parallelization of Data.
If you assume that the Hadoop ecosystem is only MapReduce and Spark batch, then your understanding is correct . However, the ecosystem includes also real-time streaming tools like Apache Storm which uses multi-threading. However, modern tools handle all these programmatic needs for multi-threading by architecture/design. Their focus is scalability by architecture and design and less by laborious programming efforts.
References:
http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
https://www.safaribooksonline.com/blog/2014/01/06/multi-threading-storm
+++
If it helped, pls vote/accept best answer
Created ‎11-16-2016 10:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Multithreading Programming Model and MapReduce Programming Model are based on fundamentally different principles and both are meant to solve different kinds of data storage and processing problems. Multithreading is based on Parallelization of Processing where as Hadoop takes power by Parallelization of Data.
If you assume that the Hadoop ecosystem is only MapReduce and Spark batch, then your understanding is correct . However, the ecosystem includes also real-time streaming tools like Apache Storm which uses multi-threading. However, modern tools handle all these programmatic needs for multi-threading by architecture/design. Their focus is scalability by architecture and design and less by laborious programming efforts.
References:
http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
https://www.safaribooksonline.com/blog/2014/01/06/multi-threading-storm
+++
If it helped, pls vote/accept best answer
