Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

how to recover hbase using hdfs data directory

Rising Star

My old hdfs data directory location - /apps/hbase/data

My new hdfs data directory location - /apps/hbase/data2

Hbase table Name - CUTOFF2

create 'CUTOFF2', {NAME => '1'}

I am doing following steps to recover data. But not working. Please tell me where I am wrong-

hadoop fs -ls /apps/hbase/data/data/default/CUTOFF2/4c8d68c329cdb6d73d4094fd64e5e37d/1/d321dfcd3b1245d2b5cc2ec1aab3a9f2 hadoop fs -ls /apps/hbase/data2/data/default/CUTOFF2/8f1aff44991e1a08c6a6bbf9c2546cf6/1

put 'CUTOFF2' , 'samplerow', '1:1' , 'sampledata' count 'CUTOFF2'

su - hbase

hadoop fs -cp /apps/hbase/data/data/default/CUTOFF2/4c8d68c329cdb6d73d4094fd64e5e37d/1/d321dfcd3b1245d2b5cc2ec1aab3a9f2 /apps/hbase/data2/data/default/CUTOFF2/8f1aff44991e1a08c6a6bbf9c2546cf6/1

major_compact 'CUTOFF2'

Please correct my steps so recovery works.

1 ACCEPTED SOLUTION

Contributor

Hi @Raja Ray, here are the steps for recover Hfiles in another hdfs directory:

1. Shutdown the hbase with old hdfs path.

2. Change 'hbase.rootdir' to new path and restart hbase.

3. Create table 'CUTOFF2', so that new htable structure will be created in new hdfs path, and of course, it's empty.

4. Use distcp to copy hfile(s) from old path to new path in case the hfile(s) are very huge.

5. Do a 'hbase hbck' on the new hbase, and there should be something wrong with the 'CUTOFF2'.

6. Do a 'hbase hbck -repair' on the problematic table and it will finalize the recovery.

7. Done

View solution in original post

12 REPLIES 12

Explorer

In hbase-site.xml, you need to change the "hbase.rootdir" property to your new location.

Contributor

Hi @Raja Ray, here are the steps for recover Hfiles in another hdfs directory:

1. Shutdown the hbase with old hdfs path.

2. Change 'hbase.rootdir' to new path and restart hbase.

3. Create table 'CUTOFF2', so that new htable structure will be created in new hdfs path, and of course, it's empty.

4. Use distcp to copy hfile(s) from old path to new path in case the hfile(s) are very huge.

5. Do a 'hbase hbck' on the new hbase, and there should be something wrong with the 'CUTOFF2'.

6. Do a 'hbase hbck -repair' on the problematic table and it will finalize the recovery.

7. Done

Rising Star

Thanks Victor. I will follow your steps and will let you know.

Rising Star

Hi @Victor Xu,

I followed your steps. It is working fine.

But i needed to restart hbase

Can you please suggest me any other way where I don't need to restart hbase service.

Thanks,

Raja

Contributor

Hi @Raja Ray,

1. Which version of hbase are you using?

2. When performing my steps, is there any specific error log that you can share with me?

3. Could you elaborate on your use case?

Thanks,

Victor

Rising Star

Hi @Victor Xu,

I followed your steps. It is working fine.

But i needed to restart hbase

Can you please suggest me any other way where I don't need to restart hbase service.

Thanks,

Raja

Contributor

Ok, I understand. But even if you just want to change hdfs root directory for a running hbase cluster, you'll need a restart to make it work.

Do you mean you've already change the root path to '/apps/hbase/data2' before starting your current hbase cluster?

Contributor

In other words, there's no 'hot switch' for this 'hbase.rootdir' parameter. If you want to change it, you have to restart hbase to make it work.

Contributor

Hi @Raja Ray,

I checked but HBase rolling upgrade won't help here either, because HMaster and RS both use this 'hbase.rootdir' in the runtime and only changing part of them would cause data inconsistencies. So my suggestion would be create a smaller temporary hbase cluster to handle all the production requests and do a quick restart on the main hbase cluster. Modifying 'hbase.rootdir' really needs downtime.

Hope that will help.

Thanks,

Victor

Rising Star

Hi @Victor Xu,

Thanks. I understand your point.

I have couple of questions here to understand the scenario more clearly-

1. If I put data in temporary hbase cluster during main hbase cluster downtime, then how I will merge data from temporary cluster to main cluster when main cluster will be up and running.

2. When I am restoring data from hdfs hfile location to new location, then how I will recover memstore data.

3. If I shutdown restart hbase service, is memstore data being flushed to hdfs hfile that time?

Thanks,

Raja

Contributor

Hi @Raja Ray,

To answer your questions:

1. If I put data in temporary hbase cluster during main hbase cluster downtime, then how I will merge data from temporary cluster to main cluster when main cluster will be up and running.

  • If there are only Put operations during the main cluster downtime, you can use CopyTable tool or Export& Bulkload tool to migrate data from temporary cluster back to main cluster after it's up.
  • But if there are both Put and Delete operations during the main cluster downtime, the best way to migrate data is to set up hbase replication from temporary cluster to main cluster. This will read all WALs(Write-ahead-log) and replay both Puts and Deletes on the main cluster after it's up.

2. When I am restoring data from hdfs hfile location to new location, then how I will recover memstore data.

  • Memstore is a place in RS to keep incoming data. It will start growing when new write operations are coming.
  • If you mean the blockcache of the hfile, that will be reload into memory when new read operations are coming.

3. If I shutdown restart hbase service, is memstore data being flushed to hdfs hfile that time?

  • Yes, memstore would be forced to flush to hfile before RS is shutdown.
  • Make sure hdfs path '/apps/hbase/data/WALs/' is empty after hbase being shutdown, so that all memstore data has been flushed into hfiles.

Thanks,

Victor

Rising Star

Thanks a lot @Victor Xu.

All points are clear.