This article will describe the steps to access the remote CDP cluster HBase data from another CDP cluster using Spark in Kerberized environment.

Assume we have two clusters say 'Cluster_A' and 'Cluster_B'. 'Cluster_A' is having the HBase data. 'Cluster_B' is having Spark. Now, we are trying to access the HBase data available in 'Cluster_A' using Spark from 'Cluster_B'.


Both clusters 'Cluster_A' and 'Cluster_B' with keytabs need to have the same REALM.

Follow these steps: 

  1. Login to 'Cluster_A' edge node
  2. Obtain the Kerberos ticket using kinit:
    kinit -kt <key_tab_file> <principal_name>​
  3. Login to the Hbase shell and create the 'person' table:
    ranga]# hbase shell
    hbase(main):001:0> list
    0 row(s)
    hbase(main):013:0> create 'person', 'p'
    Created table person
    Took 8.2307 seconds
    => Hbase::Table - person
    hbase(main):014:0> put 'person',1,'p:id','1'
    Took 0.0173 seconds
    hbase(main):015:0> put 'person',1,'p:name','Ranga Reddy'
    Took 0.0045 seconds
    hbase(main):016:0> put 'person',1,'p:email',''
    Took 0.0043 seconds
    hbase(main):017:0> put 'person',1,'p:age','25'
    Took 0.0049 seconds
    hbase(main):018:0> scan 'person'
    ROW                                                              COLUMN+CELL
     1                                                               column=p:age, timestamp=1616425683759, value=25
     1                                                               column=p:email, timestamp=1616425681754,
     1                                                               column=p:id, timestamp=1616425681717, value=1
     1                                                               column=p:name, timestamp=1616425681736, value=Ranga Reddy
    hbase(main):018:0> exit
  4. Copy the hbase-site.xml from 'Cluster_A' to 'Cluster_B':
    scp /etc/hbase/conf/hbase-site.xml root@cluster_b_ipaddress:/tmp​
  5. Login to 'Cluster_B' edge node
  6. Obtain the Kerberos ticket using kinit:
    kinit -kt <key_tab_file> <principal_name>
  7. Place the above-copied hbase-site.xml (in Step 4) into some temporary directory. Example: /tmp/hbase/conf
    mkdir -p /tmp/hbase/conf
    cp /tmp/hbase-site.xml /tmp/hbase/conf
  8. Launch the Spark shell by providing HBase Spark connector packages and HBase configuration (/tmp/hbase/conf) directory:
    spark-shell \
    --master yarn \
    --conf spark.driver.extraClassPath=/tmp/hbase/conf \
    --conf spark.executor.extraClassPath=/tmp/hbase/conf \
    --jars /opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark-1.0.0*.jar,\
  9. Run the following code after launching the spark-shell:
    val df ="org.apache.hadoop.hbase.spark").option("hbase.columns.mapping", "id STRING :key, name STRING :p:name, email STRING p:email, age STRING p:age").option("hbase.table", "person").option("hbase.spark.use.hbasecontext", false).load()
    The output of the above command:
    |age|name|email |id |
    |25 |1 || |

Thanks for reading this article. I hope you have enjoyed it.

‎03-24-2021 09:44 PM
