I want to take cluster backup to S3 and then wipe out the cluster ,may be later spin it again .
for taking backup to s3 what is the best apporoach .
1- copy the entire hdfs content to S3 ( in single bucket or create mutiple bucket)
2- do i need to take hbase snapshot along with /hbase copy to s3 or either one will work.
3- REST encryption is enabled ,any special consideration to take while moving backup to S3
4- and how restore will work from S3 , just copy to hdfs
I hope you have solved your problem by now but here are some thoughts.
Is your data entirely hBase? I think that makes things more difficult and outside of my expertise.I think you need to look into procedures for backing up an hBase database. It is almost irrelevant that you are using S3. The problem you face would be the same no matter what the backup medium is.
Normally - for most files and Hive tables I would lift and shift: Read from HDFS and copy to S3. If you have "at rest" encryption then I would expect that the reading process would decrypt the encrypted HDFS blocks - and you could use server side encryption on the S3 bucket instead. (Test this out first so you are comfortable with it before doing so). You would keep the data files - but lose any HDFS block information. Restoring those files would mean writing them into your cluster again as if they were brand new.
If this data is not being updated though you might consider keeping it in S3 and reading it with fs.s3a
I hope that helps but I am sorry I don't know how to backup hBase.