We intent to build datalake in cloud. Can some one help me to understand how the data is backed up in cloud for Hive, Hbase, Hadoop (entire stack)?..
In a way how to make sure nodes are never gets corrupted all the time? Looking for some best practices for building datalake in cloud. For ex: For feed or any file drop we can create a S3 bucket in AWS.. Does similar principles also applied for Hadoop, HBase, Hive (I guess no..) as they are distributed and data will be replicated..
(+) How to deploy the Hadoop instances in multi region (i.e. if entire data center goes down)..