- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on
03-24-2021
11:21 AM
- edited on
03-24-2021
09:44 PM
by
subratadas
This article will describe the steps to access the remote CDP cluster HBase data from another CDP cluster using Spark in Kerberized environment.
Assume we have two clusters say 'Cluster_A' and 'Cluster_B'. 'Cluster_A' is having the HBase data. 'Cluster_B' is having Spark. Now, we are trying to access the HBase data available in 'Cluster_A' using Spark from 'Cluster_B'.
Prerequisites:
Both clusters 'Cluster_A' and 'Cluster_B' with keytabs need to have the same REALM.
Follow these steps:
- Login to 'Cluster_A' edge node
- Obtain the Kerberos ticket using kinit:
kinit -kt <key_tab_file> <principal_name>
- Login to the Hbase shell and create the 'person' table:
ranga]# hbase shell hbase(main):001:0> list TABLE 0 row(s) hbase(main):013:0> create 'person', 'p' Created table person Took 8.2307 seconds => Hbase::Table - person hbase(main):014:0> put 'person',1,'p:id','1' Took 0.0173 seconds hbase(main):015:0> put 'person',1,'p:name','Ranga Reddy' Took 0.0045 seconds hbase(main):016:0> put 'person',1,'p:email','ranga@gmail.com' Took 0.0043 seconds hbase(main):017:0> put 'person',1,'p:age','25' Took 0.0049 seconds hbase(main):018:0> scan 'person' ROW COLUMN+CELL 1 column=p:age, timestamp=1616425683759, value=25 1 column=p:email, timestamp=1616425681754, value=ranga@gmail.com 1 column=p:id, timestamp=1616425681717, value=1 1 column=p:name, timestamp=1616425681736, value=Ranga Reddy hbase(main):018:0> exit
- Copy the hbase-site.xml from 'Cluster_A' to 'Cluster_B':
scp /etc/hbase/conf/hbase-site.xml root@cluster_b_ipaddress:/tmp
- Login to 'Cluster_B' edge node
- Obtain the Kerberos ticket using kinit:
kinit -kt <key_tab_file> <principal_name>
- Place the above-copied hbase-site.xml (in Step 4) into some temporary directory. Example: /tmp/hbase/conf
mkdir -p /tmp/hbase/conf cp /tmp/hbase-site.xml /tmp/hbase/conf
- Launch the Spark shell by providing HBase Spark connector packages and HBase configuration (/tmp/hbase/conf) directory:
spark-shell \ --master yarn \ --conf spark.driver.extraClassPath=/tmp/hbase/conf \ --conf spark.executor.extraClassPath=/tmp/hbase/conf \ --jars /opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark-1.0.0*.jar,\ /opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark-protocol-shaded-*.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-server.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-mapreduce.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-shaded-miscellaneous.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-shaded-protobuf.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-shaded.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-shaded.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-shaded-netty.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-zookeeper.jar,\ /opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-annotations.jar
- Run the following code after launching the spark-shell:
The output of the above command:val df = spark.read.format("org.apache.hadoop.hbase.spark").option("hbase.columns.mapping", "id STRING :key, name STRING :p:name, email STRING p:email, age STRING p:age").option("hbase.table", "person").option("hbase.spark.use.hbasecontext", false).load() df.show(truncate=false)
scala> df.show(truncate=false) +---+----+---------------+---+ |age|name|email |id | +---+----+---------------+---+ |25 |1 |ranga@gmail.com| | +---+----+---------------+---+
Thanks for reading this article. I hope you have enjoyed it.