Support Questions
Find answers, ask questions, and share your expertise

How erasure encoding policy works

How erasure encoding policy works

Expert Contributor

Hi All,

I'm trying to understand How hadoop 3 store data on HDFS by erasure encoding.

As per erasure encoding, currently six built-in policies are supported:

RS-3-2-1024k,RS-6-3-1024k, RS-10-4-1024k, RS-LEGACY-6-3-1024k, XOR-2-1-1024k and REPLICATION.

Replication is general term which was also using in hadoop2(replicate the data 3x).

How Reed Solomony RS-3-2-1024k(3 data blocks, 2 parity blocks and 1024k cell size) or RS-6-3-1-24k(6 data blocks, 3 parity blocks and 1024k cell size) store the data?

Suppose we are having 3 data nodes, 2 NNs, 1 Edge node. We have to store the 1GB file(abc.txt) and Block size is 128MB. How RS-3-2-1024k, RS-6-3-1024k works?

What is meaning of 6 data blocks, 1024K?

Is there any specific prerequisites for number of DATANODE's required, according to policy?

Will appreciable in advance to help me to understand the hadoop 3 concept.

Regards,

Vinay K


2 REPLIES 2

Re: How erasure encoding policy works

Explorer

Hello,

Have a look at the following doc link [https://blog.cloudera.com/blog/2015/09/introduction-to-hdfs-erasure-coding-in-apache-hadoop/] (specifically the section under ""Design and Implementation"")

This should help explain it further.

Re: How erasure encoding policy works

Expert Contributor

Hi @Pulkit Bhardwaj

I have gone through this link, from this majorly i understand the performance of 3-way replication vs EC.

Still i didn't understand how data is storing in HDFS.

If i have to store 1GB file in HDFS, Logically File size is divide into 1024MB/128MB = 8 blocks, So now how RS-6-3-1024k store data these 8 blocks? what is meaning of 6 data block in RS and how 3 parity will work?

Is EC further divide 8 blocks into sub-blocks?

Could anyone help me to understand the logic?