Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Multi-threading in Hadoop

avatar
Rising Star

Hi there,

I have an idea about Multi-threading in general but not sure how it is used in Hadoop.

Based on my knowledge, Yarn is responsible for managing/controlling Spark/Mapreduce job resources, can't think of Multi-threading here. Not sure whether it can be used anywhere else in Hadoop Eco System.

I would appreciate if anybody could provide some information on this.

Many thanks,

1 ACCEPTED SOLUTION

avatar
Super Guru

@Jeeva Jeeva

Multithreading Programming Model and MapReduce Programming Model are based on fundamentally different principles and both are meant to solve different kinds of data storage and processing problems. Multithreading is based on Parallelization of Processing where as Hadoop takes power by Parallelization of Data.

If you assume that the Hadoop ecosystem is only MapReduce and Spark batch, then your understanding is correct . However, the ecosystem includes also real-time streaming tools like Apache Storm which uses multi-threading. However, modern tools handle all these programmatic needs for multi-threading by architecture/design. Their focus is scalability by architecture and design and less by laborious programming efforts.

References:

http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/

https://www.safaribooksonline.com/blog/2014/01/06/multi-threading-storm

+++

If it helped, pls vote/accept best answer

View solution in original post

1 REPLY 1

avatar
Super Guru

@Jeeva Jeeva

Multithreading Programming Model and MapReduce Programming Model are based on fundamentally different principles and both are meant to solve different kinds of data storage and processing problems. Multithreading is based on Parallelization of Processing where as Hadoop takes power by Parallelization of Data.

If you assume that the Hadoop ecosystem is only MapReduce and Spark batch, then your understanding is correct . However, the ecosystem includes also real-time streaming tools like Apache Storm which uses multi-threading. However, modern tools handle all these programmatic needs for multi-threading by architecture/design. Their focus is scalability by architecture and design and less by laborious programming efforts.

References:

http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/

https://www.safaribooksonline.com/blog/2014/01/06/multi-threading-storm

+++

If it helped, pls vote/accept best answer