Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How erasure encoding policy works

Expert Contributor

Hi All,

I'm trying to understand How hadoop 3 store data on HDFS by erasure encoding.

As per erasure encoding, currently six built-in policies are supported:

RS-3-2-1024k,RS-6-3-1024k, RS-10-4-1024k, RS-LEGACY-6-3-1024k, XOR-2-1-1024k and REPLICATION.

Replication is general term which was also using in hadoop2(replicate the data 3x).

How Reed Solomony RS-3-2-1024k(3 data blocks, 2 parity blocks and 1024k cell size) or RS-6-3-1-24k(6 data blocks, 3 parity blocks and 1024k cell size) store the data?

Suppose we are having 3 data nodes, 2 NNs, 1 Edge node. We have to store the 1GB file(abc.txt) and Block size is 128MB. How RS-3-2-1024k, RS-6-3-1024k works?

What is meaning of 6 data blocks, 1024K?

Is there any specific prerequisites for number of DATANODE's required, according to policy?

Will appreciable in advance to help me to understand the hadoop 3 concept.

Regards,

Vinay K


2 REPLIES 2

Explorer

Hello,

Have a look at the following doc link [https://blog.cloudera.com/blog/2015/09/introduction-to-hdfs-erasure-coding-in-apache-hadoop/] (specifically the section under ""Design and Implementation"")

This should help explain it further.

Expert Contributor

Hi @Pulkit Bhardwaj

I have gone through this link, from this majorly i understand the performance of 3-way replication vs EC.

Still i didn't understand how data is storing in HDFS.

If i have to store 1GB file in HDFS, Logically File size is divide into 1024MB/128MB = 8 blocks, So now how RS-6-3-1024k store data these 8 blocks? what is meaning of 6 data block in RS and how 3 parity will work?

Is EC further divide 8 blocks into sub-blocks?

Could anyone help me to understand the logic?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.