Support Questions

Find answers, ask questions, and share your expertise

how read and write is done in parallel manner in hdfs

avatar
Explorer
 
3 REPLIES 3

avatar
Master Guru
@TAMILMARAN c

HDFS works on write once read many. It means only one client can write a file at a time. Multiple clients cannot write into an HDFS file at same time. When one client is given permission by Name node to write data on data node block, the block gets locked till the write operations is completed. If some other client requests to write on the same block of a particular file in data node, it is not permitted to do so. It has to wait till the write lock is revoked on a particular data node. All the requests are in the queue and only one client is allowed to write at a time.

These are the very good links regarding HDFS read and write operations, could you please refer to them

https://data-flair.training/blogs/hadoop-hdfs-data-read-and-write-operations/

http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Replication+Pipelining

https://data-flair.training/blogs/hdfs-data-write-operation/

https://data-flair.training/blogs/hdfs-data-read-operation/

avatar
Expert Contributor

Hi @TAMILMARAN c

Could you please share more information like source of data ? And Type of data?

Do you want to process data and then store into HDFS?

There are lot of options like MR job, Spark and others.

-Shubham

avatar

@TAMILMARAN c When you say read and write in parallel, do you mean reading a data which is In Progress to be written on to HDFS?