If i have a file on hdfs and i have done some updation in it and i do know that hdfs internally make sure to update the replicated data as well. But i want to know how hdfs do it(i mean the workflow).
I have a csv file of 500MB, DFS block size 128MB and replication factor is 2. The file will get distributed into 4blocks and replicated to all the nodes.Now if i upate some values in file,what hdfs does to update all respective replicated data?
HDFS is an append-only filesystem. You cannot alter a pre-written file, only replace it on the whole (i.e. delete or truncate, and then rewrite the whole). Random writes at arbitrary offsets, such as may be doable in most Linux filesystems, are not possible in HDFS.
I'm not sure what you mean by 'updation' (no such noun BTW), but if you're talking of the Hue's Edit File feature which lets you edit small files, it basically replaces the whole file by swapping the original with a copy.