Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Regarding Erasure Coding Architecture


Regarding Erasure Coding Architecture

New Contributor


We a group of people trying to understand the architecture of erasure coding in Hadoop 3.0. We have been facing difficulties to understand few terms and concepts regarding the same.

1. What do the terms Block, Block Group, Stripe, Cell and Chunk mean in the context of erasure coding (these terms have taken different meanings and have been used interchangeably over various documentation and blogs)? How has this been incorporated in reading and writing of EC data?

2. How has been the idea/concept of the block from previous versions carried over to EC?
3. ‎The higher level APIs, that of ErasureCoders and ErasureCodec still hasn't been plugged into Hadoop. Also, I haven't found any new Jira regarding the same. Can I know if there are any updates or pointers regarding the incorporation of these APIs into Hadoop?

4. How is the datanode for reconstruction work chosen? Also, how are the buffer sizes for the reconstruction work determined? Thanks in advance for your time and considerations.
Don't have an account?
Coming from Hortonworks? Activate your account here