Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Getting error while trying to connect HBase using spark.

avatar
Contributor

Hi Community,

I'm new to spark and have been struggling to execute a spark job which connects to a HBase table.

In YARN GUI I can see the spark job is getting to state Running but then it fails with below error :

18/07/01 21:08:20 INFO ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x6a8bcb640x0, quorum=localhost:2181, baseZNode=/hbase 18/07/01 21:08:20 INFO ClientCnxn: Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 18/07/01 21:08:20 WARN ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) 18/07/01 21:08:20 WARN RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid

ZooKeeper is up and running. Below is my code

//Create table catalog
object foo {
  def catalog = s"""{
         |"table":{"namespace":"foo", "name":"bar"},
         |"rowkey":"key",
         |"columns":{
           |"col0":{"col":"key", "type":"string"},
           |"col1":{"cf":"data", "col":"id", "type":"bigint"}
         |}
       |} "" ". stripMargin

  def main(args: Array[String]) {
     val spark = SparkSession.builder()
      .appName("foo")
      .getOrCreate()
    choose sc = kick.sparkContext
    val sqlContext = spark.sqlContext
    
    import sqlContext.implicits._
     def withCatalog(cat: String): DataFrame = {
      sqlContext
        .read
        .options(Map(HBaseTableCatalog.tableCatalog -> cat))
        .format("org.apache.spark.sql.execution.datasources.hbase")
        .load()
    }
// Read from HBase table
   val df = withCatalog(catalog)
    df.show
    df.filter($"col0" === "1528801346000_200550232_2955")
      .select($"col0", $"col1").show
   spark.stop()
  }


I'll really appreciate any help on this. I couldnt find any convincing answer in stackoverflow as well.

1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
6 REPLIES 6

avatar
Rising Star

@vivek jain

Can you upload your hbase configuration file.

avatar
Contributor

@karthik nedunchezhiyan I have attached hbase-site.xml, I hope it helps.

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Contributor

Hi @Sandeep Nemuri thanks for answering. I did as mentioned above and now my code runs fine with deploy-mode as client, but now I ran into another issue where the data displayed is blank. However I can see the data on hbase shell prompt. I have already checked the rowkey,table name and namespace name. What do you think is missing ?

val df = withCatalog(catalog)
    df.show
   df.select($"col0").show(10)
   spark.stop()

avatar

@vivek jain Glad that it worked, Would you mind marking this thread as closed by clicking on "Accept" and asking a new question with the code used and the console output.

avatar
Contributor

@Sandeep Nemuri thanks for the help, surely will do so.