Multi-threading in Hadoop

JeevaKrish — Wed, 16 Nov 2016 21:01:29 GMT

Hi there,

I have an idea about Multi-threading in general but not sure how it is used in Hadoop.

Based on my knowledge, Yarn is responsible for managing/controlling Spark/Mapreduce job resources, can't think of Multi-threading here. Not sure whether it can be used anywhere else in Hadoop Eco System.

I would appreciate if anybody could provide some information on this.

Many thanks,

Re: Multi-threading in Hadoop

cstanca — Thu, 17 Nov 2016 06:55:37 GMT

@Jeeva Jeeva

Multithreading Programming Model and MapReduce Programming Model are based on fundamentally different principles and both are meant to solve different kinds of data storage and processing problems. Multithreading is based on Parallelization of Processing where as Hadoop takes power by Parallelization of Data.

If you assume that the Hadoop ecosystem is only MapReduce and Spark batch, then your understanding is correct . However, the ecosystem includes also real-time streaming tools like Apache Storm which uses multi-threading. However, modern tools handle all these programmatic needs for multi-threading by architecture/design. Their focus is scalability by architecture and design and less by laborious programming efforts.

References:

http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/

https://www.safaribooksonline.com/blog/2014/01/06/multi-threading-storm

+++

If it helped, pls vote/accept best answer

question Re: Multi-threading in Hadoop in Archives of Support Questions (Read Only)

Multi-threading in Hadoop

Re: Multi-threading in Hadoop