I'm trying to understand How hadoop 3 store data on HDFS by erasure encoding.
As per erasure encoding, currently six built-in policies are supported:
RS-3-2-1024k,RS-6-3-1024k, RS-10-4-1024k, RS-LEGACY-6-3-1024k, XOR-2-1-1024k and REPLICATION.
Replication is general term which was also using in hadoop2(replicate the data 3x).
How Reed Solomony RS-3-2-1024k(3 data blocks, 2 parity blocks and 1024k cell size) or RS-6-3-1-24k(6 data blocks, 3 parity blocks and 1024k cell size) store the data?
Suppose we are having 3 data nodes, 2 NNs, 1 Edge node. We have to store the 1GB file(abc.txt) and Block size is 128MB. How RS-3-2-1024k, RS-6-3-1024k works?
What is meaning of 6 data blocks, 1024K?
Is there any specific prerequisites for number of DATANODE's required, according to policy?
Will appreciable in advance to help me to understand the hadoop 3 concept.
Have a look at the following doc link [https://blog.cloudera.com/blog/2015/09/introduction-to-hdfs-erasure-coding-in-apache-hadoop/] (specifically the section under ""Design and Implementation"")
This should help explain it further.
I have gone through this link, from this majorly i understand the performance of 3-way replication vs EC.
Still i didn't understand how data is storing in HDFS.
If i have to store 1GB file in HDFS, Logically File size is divide into 1024MB/128MB = 8 blocks, So now how RS-6-3-1024k store data these 8 blocks? what is meaning of 6 data block in RS and how 3 parity will work?
Is EC further divide 8 blocks into sub-blocks?
Could anyone help me to understand the logic?