Support Questions

Find answers, ask questions, and share your expertise

How to backup the data for hortonworks instances (Hadoop, hive, zookeeper, HBase entire stack..)

We intent to build datalake in cloud. Can some one help me to understand how the data is backed up in cloud for Hive, Hbase, Hadoop (entire stack)?..

In a way how to make sure nodes are never gets corrupted all the time? Looking for some best practices for building datalake in cloud. For ex: For feed or any file drop we can create a S3 bucket in AWS.. Does similar principles also applied for Hadoop, HBase, Hive (I guess no..) as they are distributed and data will be replicated..

(+) How to deploy the Hadoop instances in multi region (i.e. if entire data center goes down)..


Super Collaborator

This is a very broad topic, and might make sense to use a vendor supported tool like EMR or Qubole. Cloudbreak or Hortonworks itself doesn't offer very well-defined backup tools.

For example, Hadoop DistCP and mysqldump/pgdump, Hive/HBase Export only get you so far.