Created 03-27-2018 09:43 AM
Created 03-27-2018 10:09 AM
HDFS works on write once read many. It means only one client can write a file at a time. Multiple clients cannot write into an HDFS file at same time. When one client is given permission by Name node to write data on data node block, the block gets locked till the write operations is completed. If some other client requests to write on the same block of a particular file in data node, it is not permitted to do so. It has to wait till the write lock is revoked on a particular data node. All the requests are in the queue and only one client is allowed to write at a time.
These are the very good links regarding HDFS read and write operations, could you please refer to them
https://data-flair.training/blogs/hadoop-hdfs-data-read-and-write-operations/
http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Replication+Pipelining
https://data-flair.training/blogs/hdfs-data-write-operation/
Created 03-27-2018 10:11 AM
Could you please share more information like source of data ? And Type of data?
Do you want to process data and then store into HDFS?
There are lot of options like MR job, Spark and others.
-Shubham
Created 03-27-2018 09:18 PM
@TAMILMARAN c When you say read and write in parallel, do you mean reading a data which is In Progress to be written on to HDFS?