Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HBase DataMigration from Source (RHEL 7 Cloudera Managed) to Open-Source self-managed 5 node cluster

avatar

Hello team,
My main work is dependent on HBase Data.

I have Source (RHEL 7 CDH cluster) and Destination (RHEL 9 Open-Source Cluster self-managed cluster)

RHEL 9 Open-Source Cluster Conf:

  • Apache Services Configured and running:
    • Zookeeper
    • Hadoop
    • Yarn is Default conf
    • HBase
  • Migration Approach:
    • Export - Import approach:
      • Did the export on the source side.
      • Tried doing distcp to source which won't work because of authentication mismatch and non-presence of marred because it was not configured in the source side. 
      • So I did the winSCP to destination and tried import which didn't work because the MapRed was not configured because it was not in the requirement.
      • So I skipped this approach.
    • Snapshot approach:
      • I took snapshot on source side and tried doing snapshot-export. It gave an error saying that the destination is a 'SIMPLE AUTH' but the source is Kerberos Authenticated. So, it didn't work.
      • Manual Approach: 
        • Took snapshots on source.
        • Hadoop Get: Used get command to copy the snapshots file, /hbase/data/DataCapture (Actual Data Dir) and /hbase/data/hbase (Meta Data Dir)
        • winSCP: Then used winSCP to transfer these directories from source to destination.
        • Meta Conf: But before that I have replaced the metadata info in RHEL 7 and replaced with RHEL 9 HBase info.
          • Region Servers
          • Server Start Codes
        • Stopped HBase and restarted but it didnt work out and its throwing the following error which i do 'list' :
          • 2024-09-17 08:32:32,622 INFO zookeeper.ClientCnxn: EventThread shut down for session: 0x102e6e64b450116
            2024-09-17 08:32:41,113 INFO client.RpcRetryingCallerImpl: Call exception, tries=12, retries=16, started=68591 ms ago, cancelled=false, msg=org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
                    at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3188)
                    at org.apache.hadoop.hbase.master.MasterRpcServices.getTableDescriptors(MasterRpcServices.java:1073)
                    at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
                    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415)
                    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
                    at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
                    at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
            , details=, see https://s.apache.org/timeout
  • What is the best way to proceed sir ? 
    • Should i format the HBase? Or change the root dir to new one to start freshly? What approach should i procced sir?
    • Should is configure Map-Red to do import on the destination side?
    • As the destination side authentication is 'SIMPLE AUTH' do i need to Kerberos it sir or I can modify the configurations in the source RHEL 7 Cloudera managed to 'SIMPLE AUTH' ?.
      • Note: My client uses LADP, AD and kerberos for Application purposes. 
      • The KDC is Active directory itself (AD Kerberos)
  • Or is there any other way ?

Please give me a detailed information sir. Really appreciate it. 

Thanks,
Srikanth

2 REPLIES 2

avatar

Looking forward for the detailed information. Thanks

avatar
Expert Contributor

Hi @therealsrikanth 

Manual snapshots copying is the only way forward as you have security and compatibility issues between the clusters.

Kindly try to follow below steps -

Create Snapshot Operation

  1. Take the snapshot in CDH. For example after login from HBase shell:
    $ hbase shell
    hbase> snapshot '<TABLE_NAME>', '<SNAPSHOT_NAME>'
  2. Major compact the table 
    $ major_compact '<TABLE_NAME>'
  3. Copy the files to the local environment from the below locations:
    hdfs dfs -get /hbase/data/.hbase-snapshot/ /tmp/dir

    hdfs dfs -get /hbase/data/archive/ /tmp/dir2

Restore operation

  1. Transfer the files to the CDP environment.
  2. Use -copyFromLocal operation to copy the contents to HDFS:

    cd /tmp

    hdfs dfs -copyFromLocal dir /hbase/data/.hbase-snapshot
    hdfs dfs -copyFromLocal dir2 /hbase/archive/data/default

    Note: "default" is a namespace name on which newly created tables are placed if you don't specify a custom namespace. 

    Make sure, the directories are created in HDFS. the path should look like this after copying:

     

    /hbase/archive/data/<Namespace>/<TABLE_NAME>/<hfile1>

    /hbase/archive/data/<Namespace>/<TABLE_NAME>/<hfile2>

    ...
  3. Check permissions on /hbase/archive directory, it should be owned by user HBase.
  4. Login to the HBase shell and check the snapshots:
    Hbase shell
    hbase:001:0> list_snapshots
  5. When the snapshot is visible, you can use clone_snapshot command to create a new table using the snapshot:
    hbase> clone_snapshot '<SNAPSHOT_NAME>', '<TABLE_NAME_NEW>'

    Was your question answered? Please take some time to click on "Accept as Solution" -- If you find a reply useful, say thanks by clicking on the thumbs up button below this post.