Support Questions
Find answers, ask questions, and share your expertise

primary data files and replicated data files

Highlighted

primary data files and replicated data files

Explorer

In HDFS, is it possible to identify primary data files from replicated data files?

 

ie let us suppose I have t datanodes running on three machines with default replication factor of three. Then I copy over a 3 gb file and it gets split amongst the three nodes @ 1 gb each. But every node also contains the replicated data of the other two nodes. So basically each data node will have 3 gb worth of data – 1 gb of its primary data and 2 gb (1 + 1) of replicated data.

 

So in this scenario, is it possible to identify which files constitute primary data for the node and which files represent replicated data.

 

Hope I am not confusing.

 

Appreciate the insights.

1 REPLY 1
Highlighted

Re: primary data files and replicated data files

Master Collaborator
As far as I know, there is no distinction in HDFS between primary replicas
and secondary replicas. There are just a certain number of replicas of each
block. The NameNode maps block IDs to their locations, and no location is
necessarily higher importance than another.