Created on 03-24-202111:21 AM - edited on 03-24-202109:44 PM by subratadas
This article will describe the steps to access the remote CDP cluster HBase data from another CDP cluster using Spark in Kerberized environment.
Assume we have two clusters say 'Cluster_A' and 'Cluster_B'. 'Cluster_A' is having the HBase data. 'Cluster_B' is having Spark. Now, we are trying to access the HBase data available in 'Cluster_A' using Spark from 'Cluster_B'.
Prerequisites:
Both clusters 'Cluster_A' and 'Cluster_B' with keytabs need to have the same REALM.
Follow these steps:
Login to 'Cluster_A' edge node
Obtain the Kerberos ticket using kinit:
kinit -kt <key_tab_file> <principal_name>
Login to the Hbase shell and create the 'person' table:
ranga]# hbase shell
hbase(main):001:0> list
TABLE
0 row(s)
hbase(main):013:0> create 'person', 'p'
Created table person
Took 8.2307 seconds
=> Hbase::Table - person
hbase(main):014:0> put 'person',1,'p:id','1'
Took 0.0173 seconds
hbase(main):015:0> put 'person',1,'p:name','Ranga Reddy'
Took 0.0045 seconds
hbase(main):016:0> put 'person',1,'p:email','ranga@gmail.com'
Took 0.0043 seconds
hbase(main):017:0> put 'person',1,'p:age','25'
Took 0.0049 seconds
hbase(main):018:0> scan 'person'
ROW COLUMN+CELL
1 column=p:age, timestamp=1616425683759, value=25
1 column=p:email, timestamp=1616425681754, value=ranga@gmail.com
1 column=p:id, timestamp=1616425681717, value=1
1 column=p:name, timestamp=1616425681736, value=Ranga Reddy
hbase(main):018:0> exit
Copy the hbase-site.xml from 'Cluster_A' to 'Cluster_B':