Support Questions

therealsrikanth · ‎09-17-2024

Hello team,
My main work is dependent on HBase Data.

I have Source (RHEL 7 CDH cluster) and Destination (RHEL 9 Open-Source Cluster self-managed cluster)

RHEL 9 Open-Source Cluster Conf:

Apache Services Configured and running:
- Zookeeper
- Hadoop
- Yarn is Default conf
- HBase
Migration Approach:
- Export - Import approach:
  - Did the export on the source side.
  - Tried doing distcp to source which won't work because of authentication mismatch and non-presence of marred because it was not configured in the source side.
  - So I did the winSCP to destination and tried import which didn't work because the MapRed was not configured because it was not in the requirement.
  - So I skipped this approach.
- Snapshot approach:
  - I took snapshot on source side and tried doing snapshot-export. It gave an error saying that the destination is a 'SIMPLE AUTH' but the source is Kerberos Authenticated. So, it didn't work.
  - Manual Approach:
    - Took snapshots on source.
    - Hadoop Get: Used get command to copy the snapshots file, /hbase/data/DataCapture (Actual Data Dir) and /hbase/data/hbase (Meta Data Dir)
    - winSCP: Then used winSCP to transfer these directories from source to destination.
    - Meta Conf: But before that I have replaced the metadata info in RHEL 7 and replaced with RHEL 9 HBase info.
      - Region Servers
      - Server Start Codes
    - Stopped HBase and restarted but it didnt work out and its throwing the following error which i do 'list' :
      - 2024-09-17 08:32:32,622 INFO zookeeper.ClientCnxn: EventThread shut down for session: 0x102e6e64b450116
        2024-09-17 08:32:41,113 INFO client.RpcRetryingCallerImpl: Call exception, tries=12, retries=16, started=68591 ms ago, cancelled=false, msg=org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
                at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3188)
                at org.apache.hadoop.hbase.master.MasterRpcServices.getTableDescriptors(MasterRpcServices.java:1073)
                at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
                at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415)
                at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
                at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
                at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
        , details=, see https://s.apache.org/timeout
What is the best way to proceed sir ?
- Should i format the HBase? Or change the root dir to new one to start freshly? What approach should i procced sir?
- Should is configure Map-Red to do import on the destination side?
- As the destination side authentication is 'SIMPLE AUTH' do i need to Kerberos it sir or I can modify the configurations in the source RHEL 7 Cloudera managed to 'SIMPLE AUTH' ?.
  - Note: My client uses LADP, AD and kerberos for Application purposes.
  - The KDC is Active directory itself (AD Kerberos)
Or is there any other way ?

Please give me a detailed information sir. Really appreciate it.

Thanks,
Srikanth

therealsrikanth · ‎09-17-2024

Looking forward for the detailed information. Thanks

shubham_sharma · ‎09-18-2024

Hi @therealsrikanth

Manual snapshots copying is the only way forward as you have security and compatibility issues between the clusters.

Kindly try to follow below steps -

Create Snapshot Operation

Take the snapshot in CDH. For example after login from HBase shell:
```
$ hbase shell
hbase> snapshot '<TABLE_NAME>', '<SNAPSHOT_NAME>'
```
Major compact the table
```
$ major_compact '<TABLE_NAME>'
```

Copy the files to the local environment from the below locations:

hdfs dfs -get /hbase/data/.hbase-snapshot/ /tmp/dir

hdfs dfs -get /hbase/data/archive/ /tmp/dir2

Restore operation

Transfer the files to the CDP environment.
Use -copyFromLocal operation to copy the contents to HDFS:
```
cd /tmp

hdfs dfs -copyFromLocal dir /hbase/data/.hbase-snapshot
hdfs dfs -copyFromLocal dir2 /hbase/archive/data/default
```
Note: "default" is a namespace name on which newly created tables are placed if you don't specify a custom namespace.
Make sure, the directories are created in HDFS. the path should look like this after copying:
```
/hbase/archive/data/<Namespace>/<TABLE_NAME>/<hfile1>

/hbase/archive/data/<Namespace>/<TABLE_NAME>/<hfile2>

...
```
Check permissions on /hbase/archive directory, it should be owned by user HBase.
Login to the HBase shell and check the snapshots:
```
Hbase shell
hbase:001:0> list_snapshots
```
When the snapshot is visible, you can use clone_snapshot command to create a new table using the snapshot:
```
hbase> clone_snapshot '<SNAPSHOT_NAME>', '<TABLE_NAME_NEW>'
```
Was your question answered? Please take some time to click on "Accept as Solution" -- If you find a reply useful, say thanks by clicking on the thumbs up button below this post.

Cloudera Community

Support Questions

HBase DataMigration from Source (RHEL 7 Cloudera Managed) to Open-Source self-managed 5 node cluster

Setting up Authentication-Kerberos in Open-Source ...

open source Cloudera for test

Open Source Geospatial Analytics with Apache Spar...

New method to capturing Cloudera Manager (CM 7.x+)...

Remove a host from Cloudera Manager and add it to ...

Migrating HDP cluster nodes from CentOS 6 to CentO...

Is Ambari Infra open source?

Cloudera Manager HBase Trigger

BEST Open source Bussines intelligence Hadoop

Apache Ambari Workflow Manager View for Apache Ooz...