Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

How match number of pipelines with number of the blocks in the file

Rising Star

Hi dear community!

 

i'll very appreciate if someone could tell how match number of pipelines with number of the blocks in the file?

i mean pipelines that are creating for writhig HDFS block (https://issues.apache.org/jira/secure/attachment/12445209/appendDesign3.pdf)

is number of pipelines equal of number of blocks in required file? In other words one pipeline created for one file block, right? Or one pipeline per file (multiple blocks)?

 

thanks!

1 ACCEPTED SOLUTION

Master Guru
Every block's replica location allocation gets a different set of DNs.
They may all have one common
DataNode if the writer client is running on a host that also runs a
DataNode, but the other replicas will be randomly selected (within or
outside of racks, depending on topology).

View solution in original post

4 REPLIES 4

Master Guru
A pipeline is created to write multiple replicas for every single
under-construction block. For every block's end, the pipeline is
completed and closed, and a subsequent new pipeline is opened for the
next block (if there is one). There's no notion of "number of
pipelines" given that its a sequential operation - why do you seek
such a number?

Rising Star

thanks for your reply!

let me clarify my question:

does each file block has unique set of the Datanodes?

or all blocks writes in single set of datanodes?

 

thanks!

Master Guru
Every block's replica location allocation gets a different set of DNs.
They may all have one common
DataNode if the writer client is running on a host that also runs a
DataNode, but the other replicas will be randomly selected (within or
outside of racks, depending on topology).

Rising Star
Thanks! Everything clear now!