Created on 11-15-2019 12:45 AM - last edited on 11-17-2019 10:47 PM by VidyaSargur
hi all
I want to ask this important question
lets say we have the following:
HDP cluster with :
3 masters machine ( active/standby name-node ) , ( active/standby resource manager )
3 datanode machines
- each data-node machine have 4 disks for HDFS ( not include the OS )
3 kafka machines
- each kafka machine have one disk of 10T ( not include the OS )
now we want to install from scratch all cluster include HDP and ambari
but save the data on datanode machines and kafka topics data machine by the following:
we umount the disks on datanode machines and kafka machines
example
on datanode machine ( note - /etc/stab is already configured )
umount /grid/data1
umount /grid/data2
.
.
.
so the second scratch installation we install all the cluster ( by blueprint ) , but without data-node HDFS disks , and kafka topic disk ( scratch installation means fresh new linux OS )
after installation we mount all data-node machines disks and kafka disks machines ( where we are store all topics )
example
on datanode machine ( note - /etc/stab is already configured )
mount /grid/data1
mount /grid/data2
.
.
.
in order to complete the picture , need to restart HDFS and YARN and kafka
so - is this scenario could to work ?
Created 11-15-2019 06:10 AM
@mike_bronson7 you just need to backup /hadoop/hdfs/namenode/current from active namenode
Also if you backup one week earlier the activity and lets say your first cluster is going serve more request to clients then you will loose that data which was written after backup.
So best is to do savenamespace and backup when you are going to do activity and freeze clients not accessing the cluster.
Created 11-15-2019 02:06 AM
What i got from your scenario is on second scratch installation your master nodes [ie. active/standby name-node ] are fresh installed and you are only adding the datanodes which are having pre-existing data [from other cluster]..right!!
-- In this case its not possible to get the cluster up with new data from the HDD which was restored.
Since namenode will not have any information about the blocks lying in blockstorage on the datanode disk.
If you have opted support from Cloudera then you can approach them for DR scenario where they can help you to get existing data from datanodes to be added back in cluster[not sure if it can be recovered/added back 100%]
Same for kafka.
Created 11-15-2019 02:40 AM
ok
do you can summarize all options to recover the namenode ( also option out from the box )
Created 11-15-2019 03:13 AM
1. if you can backup metadata drom orignal cluster(where datanode were existing at first) and copy that metadata to new cluster then thats the best option.
2. if you are not able to go with point 1, then probably you can try " hadoop namenode -recover" option.
below link might be useful
https://blog.cloudera.com/understanding-hdfs-recovery-processes-part-1/
https://clouderatemp.wpengine.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/
Created on 11-15-2019 04:37 AM - edited 11-15-2019 04:46 AM
about option one
I guess you not mean to backup the metadata by copy it as scp or rsync
maybe you means that there are tool for backup like barman for postgresql
so do you know tool for this option?
on each name nodes we have the following folders
/hadoop/hdfs/namenode/current ( when fsimage exists i
/hadoop/hdfs/journal/hdfsha/current/
do you means to backup only these folders , lets say every day ?
Created 11-15-2019 04:53 AM
Created on 11-15-2019 05:24 AM - edited 11-15-2019 05:26 AM
since we have both current folders as:
/hadoop/hdfs/namenode/current ( when fsimage exists i
/hadoop/hdfs/journal/hdfsha/current/
do you mean to backup both them?
second how backup for time prescriptive
for example one week or more ?
Created 11-15-2019 06:10 AM
@mike_bronson7 you just need to backup /hadoop/hdfs/namenode/current from active namenode
Also if you backup one week earlier the activity and lets say your first cluster is going serve more request to clients then you will loose that data which was written after backup.
So best is to do savenamespace and backup when you are going to do activity and freeze clients not accessing the cluster.