Created on
12-15-2019
11:49 PM
- last edited on
12-16-2019
03:02 PM
by
lwang
Dear All.
how we can say data is replicated over diff data nodes :-
case 01:-
i loaded data into HDFS path :- /root/data/sample.log using Hue GUI.
question 01 :- is case 01 data will replicate over all data nodes?if yes then where and how?
case 02:-
i loaded data into HDFS path from LFS :-
HDFS PATH :-/root/data/test.txt
LFS PATH :- /home/cdh/test.txt
Question 02:- is case 02 HDFS data will replicate over all data nodes?if yes then where and how?
case 03:-
i executed some pig command over hdfs path data :- /root/data/pig.log to save this file within hdfs path /root/data/pigoutput/*
Question :- is case 03 HDFS data will replicate over all data nodes?if yes then where and how?
Note :- this question asked by my admin member who has newly join within my team member but i am not able to answer these question ?
Please some one suggest me or answer to help !!!
Thanks
HadoopHelp
Created 12-16-2019 12:43 AM
Hi @HadoopHelp
For all questions then answer is -
If you are loading any file to hdfs then using below command you can check if the file is replicated across datanodes or not.
The default replication factor is 3 so you can see 3 copied in below command command -
hdfs fsck /myfile.txt -files -blocks -location
Eg. hosts is filename in my case
hdfs fsck /tmp/hosts -files -blocks -locations
/tmp/hosts 1157 bytes, 1 block(s): OK
0. BP-762887186-10.147.167.59-1521037753807:blk_1073748028_7830 len=1157 repl=4 [DatanodeInfoWithStorage[10.1.6.40:1019,DS-6cf46ebf-57fa-4d26-a0f8-f7b99f28424a,DISK], DatanodeInfoWithStorage[10.1.6.44:1019,DS-838d4d62-2069-4b73-b142-76ae1025ae6c,DISK], DatanodeInfoWithStorage[10.1.6.50:1019,DS-da75b9c5-5520-43f4-8e90-60d5982c714d,DISK], DatanodeInfoWithStorage[10.1.6.46:1019,DS-954af47c-1ba7-4057-aacd-1eae700d58cf,DISK]]
In above example you can see it shows blocks present on 3 datanodes
Created 12-16-2019 04:30 AM
Hi @sagarshimpi .
Thanks
But where we can find the path as you mentioned below:-
0. BP-762887186-10.147.167.59-1521037753807:blk_1073748028_7830 len=1157 repl=4 [DatanodeInfoWithStorage[10.1.6.40:1019,DS-6cf46ebf-57fa-4d26-a0f8-f7b99f28424a,DISK], DatanodeInfoWithStorage[10.1.6.44:1019,DS-838d4d62-2069-4b73-b142-76ae1025ae6c,DISK], DatanodeInfoWithStorage[10.1.6.50:1019,DS-da75b9c5-5520-43f4-8e90-60d5982c714d,DISK], DatanodeInfoWithStorage[10.1.6.46:1019,DS-954af47c-1ba7-4057-aacd-1eae700d58cf,DISK]]
i am able to get this from command as well as i am able see from namenode:50070 here also.
but where is this file path :-
for example my data is deleted from 10.1.6.40:1019 here then how can i take from other nodes?
Thanks
HadoopHelp