Support Questions

Find answers, ask questions, and share your expertise

how to recover hbase using hdfs data directory

avatar
Expert Contributor

My old hdfs data directory location - /apps/hbase/data

My new hdfs data directory location - /apps/hbase/data2

Hbase table Name - CUTOFF2

create 'CUTOFF2', {NAME => '1'}

I am doing following steps to recover data. But not working. Please tell me where I am wrong-

hadoop fs -ls /apps/hbase/data/data/default/CUTOFF2/4c8d68c329cdb6d73d4094fd64e5e37d/1/d321dfcd3b1245d2b5cc2ec1aab3a9f2 hadoop fs -ls /apps/hbase/data2/data/default/CUTOFF2/8f1aff44991e1a08c6a6bbf9c2546cf6/1

put 'CUTOFF2' , 'samplerow', '1:1' , 'sampledata' count 'CUTOFF2'

su - hbase

hadoop fs -cp /apps/hbase/data/data/default/CUTOFF2/4c8d68c329cdb6d73d4094fd64e5e37d/1/d321dfcd3b1245d2b5cc2ec1aab3a9f2 /apps/hbase/data2/data/default/CUTOFF2/8f1aff44991e1a08c6a6bbf9c2546cf6/1

major_compact 'CUTOFF2'

Please correct my steps so recovery works.

1 ACCEPTED SOLUTION

avatar
Rising Star

Hi @Raja Ray, here are the steps for recover Hfiles in another hdfs directory:

1. Shutdown the hbase with old hdfs path.

2. Change 'hbase.rootdir' to new path and restart hbase.

3. Create table 'CUTOFF2', so that new htable structure will be created in new hdfs path, and of course, it's empty.

4. Use distcp to copy hfile(s) from old path to new path in case the hfile(s) are very huge.

5. Do a 'hbase hbck' on the new hbase, and there should be something wrong with the 'CUTOFF2'.

6. Do a 'hbase hbck -repair' on the problematic table and it will finalize the recovery.

7. Done

View solution in original post

12 REPLIES 12

avatar
Expert Contributor

Hi @Victor Xu,

Thanks. I understand your point.

I have couple of questions here to understand the scenario more clearly-

1. If I put data in temporary hbase cluster during main hbase cluster downtime, then how I will merge data from temporary cluster to main cluster when main cluster will be up and running.

2. When I am restoring data from hdfs hfile location to new location, then how I will recover memstore data.

3. If I shutdown restart hbase service, is memstore data being flushed to hdfs hfile that time?

Thanks,

Raja

avatar
Rising Star

Hi @Raja Ray,

To answer your questions:

1. If I put data in temporary hbase cluster during main hbase cluster downtime, then how I will merge data from temporary cluster to main cluster when main cluster will be up and running.

  • If there are only Put operations during the main cluster downtime, you can use CopyTable tool or Export& Bulkload tool to migrate data from temporary cluster back to main cluster after it's up.
  • But if there are both Put and Delete operations during the main cluster downtime, the best way to migrate data is to set up hbase replication from temporary cluster to main cluster. This will read all WALs(Write-ahead-log) and replay both Puts and Deletes on the main cluster after it's up.

2. When I am restoring data from hdfs hfile location to new location, then how I will recover memstore data.

  • Memstore is a place in RS to keep incoming data. It will start growing when new write operations are coming.
  • If you mean the blockcache of the hfile, that will be reload into memory when new read operations are coming.

3. If I shutdown restart hbase service, is memstore data being flushed to hdfs hfile that time?

  • Yes, memstore would be forced to flush to hfile before RS is shutdown.
  • Make sure hdfs path '/apps/hbase/data/WALs/' is empty after hbase being shutdown, so that all memstore data has been flushed into hfiles.

Thanks,

Victor

avatar
Expert Contributor

Thanks a lot @Victor Xu.

All points are clear.