Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Cannot connect Spark to HBase

Highlighted

Cannot connect Spark to HBase

New Contributor

I am trying to write data into HBase from Spark.

My env: HDP 2.3.7-4 with kerberos Authentication.

my codes are shown below:

  def main(args: Array[String]){
    val sparkConf = new SparkConf().setAppName("HBase Test")
    val sc = new SparkContext(sparkConf)
    val sqlContext = new SQLContext(sc)
    
    val hConf = HBaseConfiguration.create
    
    hConf.set("zookeeper.znode.parent", "/hbase-secure")
    hConf.set("hbase.zookeeper.quorum", "useomlxd00009.unix.us.aglobal.example.com:2181")
    hConf.set("hbase.master", "useomlxd00009.unix.us.aglobal.example.com:2181")
    hConf.set("hbase.zookeeper.property.clientPort", "2181")
    hConf.set("hadoop.security.authentication", "kerberos")
    
    hConf.addResource(new Path("/usr/hdp/curent/hbase-master/conf/hbase-site.xml", "hbase-site.xml"))
    hConf.addResource(new Path("/usr/hdp/current/hbase-master/conf/core-site.xml", "core-site.xml"))
    HBaseAdmin.checkHBaseAvailable(hConf)
  }

I submit this job with command (I added dependencies to jar file in maven configuration):

spark-submit --class className jarFileName.jar --principal user@abc.com --keytab /path/tokeytab

After submission, the application just got a deadlock and never move forward. Part of error information is shown below:

INFO ZooKeeper: Client environment:os.arch=amd64
16/08/29 20:26:48 INFO ZooKeeper: Client environment:os.version=2.6.32-573.8.1.el6.x86_64
16/08/29 20:26:48 INFO ZooKeeper: Client environment:user.name=paperreportingsvc
16/08/29 20:26:48 INFO ZooKeeper: Client environment:user.home=/home/paperreportingsvc
16/08/29 20:26:48 INFO ZooKeeper: Client environment:user.dir=/home/paperreportingsvc
16/08/29 20:26:48 INFO ZooKeeper: Initiating client connection, connectString=xxxxxxxxxxx:2181 sessionTimeout=90000 watcher=hconnection-0x3ec2ecea0x0, quorum=xxxxxxxxxxxxxx:2181, baseZNode=/hbase-secure
16/08/29 20:26:48 INFO ClientCnxn: Opening socket connection to server xxxxxxxxxxxxxxxxxxxx:2181. Will not attempt to authenticate using SASL (unknown error)
16/08/29 20:26:48 INFO ClientCnxn: Socket connection established to xxxxxxxxxxxxxx:2181, initiating session
16/08/29 20:26:48 INFO ClientCnxn: Session establishment complete on server xxxxxxxxxxxxxxxx:2181, sessionid = 0x1568e5a3fde07a4, negotiated timeout = 40000
16/08/29 20:26:49 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/08/29 20:26:49 INFO ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x1568e5a3fde07a4
16/08/29 20:26:49 INFO ZooKeeper: Session: 0x1568e5a3fde07a4 closed
16/08/29 20:26:49 INFO ClientCnxn: EventThread shut down

After the last line, the app stucks and never goes forward. I guess problem is connection from spark to zookeeper but I really dont what to add because no error information. I have been in this problem in one week and an help is appreciated.

6 REPLIES 6

Re: Cannot connect Spark to HBase

Super Guru

@Qingyang Kong

Are you sure that your parent znode is /hbase-secure? It is usually just /hbase. and by the way, rather than creating configuration like this, why not just put hbase-site.xml in your classpath and create configuration using that?

The factory method on HBaseConfiguration, HBaseConfiguration.create();, on invocation, will read in the content of the first hbase-site.xml found on the client's CLASSPATH

Re: Cannot connect Spark to HBase

New Contributor

thank you for replying me. I am sure z node parent is "hbase-secure", I found <name>zookeeper.znode.parent</name><value>/hbase-secure</value> in hbase-site.xml. I add hbase-site.xml in classpath by --driver-class-path but still the same problem.

Re: Cannot connect Spark to HBase

Super Guru

I still think there is issue with your code. Can you please check the following example and try to change your code as shown in this link. Notice that hbase-site.xml is in the class path.

https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/org/apache/hadoop/hbase/spark/ex...

Re: Cannot connect Spark to HBase

New Contributor

HBase on HDP 2.3.7 is 1.1.2 and there is not HBaseContext in this version.

Re: Cannot connect Spark to HBase

New Contributor

Because the code can be run on sandbox, I think problem is connection from spark to HBase. Do you know if there is a way to check if the spark can connect to zookeeper or hbase?

Re: Cannot connect Spark to HBase

I think the logs trace given above doesn't show any error. can you check errors related to RPC retrying at the client or something?