Support Questions
Find answers, ask questions, and share your expertise

how can we verify hdfs loaded data replicated over diff data nodes

Contributor

Dear All.

 

how we can say data is replicated over diff data nodes :-

 

case 01:-

i loaded data into HDFS path :- /root/data/sample.log using Hue GUI.

question 01 :- is case 01 data will replicate over all data nodes?if yes then where and how?

case 02:-

i loaded data into HDFS path from LFS :-

HDFS PATH :-/root/data/test.txt

LFS  PATH :- /home/cdh/test.txt

Question 02:- is case 02 HDFS data will replicate over all data nodes?if yes then where and how?

 

case 03:-

i executed some pig command over hdfs path data :- /root/data/pig.log to save this file within hdfs path /root/data/pigoutput/*

 

Question :- is case 03 HDFS data will replicate over all data nodes?if yes then where and how?

 

Note :- this question asked by my admin member who has newly join within my team member but i am not able to answer these question ?

 

 

Please some one suggest me or answer to help !!!

 

 

Thanks

HadoopHelp

 

2 REPLIES 2

Expert Contributor

Hi @HadoopHelp 

 

For all questions then answer is -

 

If you are loading any file to hdfs then using below command you can check if the file is replicated across datanodes or not.

The default replication factor is 3 so you can see 3 copied in below command command -

 

 hdfs fsck /myfile.txt -files -blocks -location

Eg. hosts is filename in my case
hdfs fsck /tmp/hosts -files -blocks -locations

/tmp/hosts 1157 bytes, 1 block(s):  OK
0. BP-762887186-10.147.167.59-1521037753807:blk_1073748028_7830 len=1157 repl=4 [DatanodeInfoWithStorage[10.1.6.40:1019,DS-6cf46ebf-57fa-4d26-a0f8-f7b99f28424a,DISK], DatanodeInfoWithStorage[10.1.6.44:1019,DS-838d4d62-2069-4b73-b142-76ae1025ae6c,DISK], DatanodeInfoWithStorage[10.1.6.50:1019,DS-da75b9c5-5520-43f4-8e90-60d5982c714d,DISK], DatanodeInfoWithStorage[10.1.6.46:1019,DS-954af47c-1ba7-4057-aacd-1eae700d58cf,DISK]]

 

In above example you can see it shows blocks present on 3 datanodes

Contributor

Hi @sagarshimpi .

 

Thanks

But where we can find the path as you mentioned below:-

 

0. BP-762887186-10.147.167.59-1521037753807:blk_1073748028_7830 len=1157 repl=4 [DatanodeInfoWithStorage[10.1.6.40:1019,DS-6cf46ebf-57fa-4d26-a0f8-f7b99f28424a,DISK], DatanodeInfoWithStorage[10.1.6.44:1019,DS-838d4d62-2069-4b73-b142-76ae1025ae6c,DISK], DatanodeInfoWithStorage[10.1.6.50:1019,DS-da75b9c5-5520-43f4-8e90-60d5982c714d,DISK], DatanodeInfoWithStorage[10.1.6.46:1019,DS-954af47c-1ba7-4057-aacd-1eae700d58cf,DISK]]

i am able to get this from command as well as i am able see from namenode:50070 here also.

 

but where is this file path :-

 

for example my data is deleted from 10.1.6.40:1019 here then how can i take from other nodes?

 

 

 

Thanks

HadoopHelp

; ;