Created 07-02-2018 02:36 AM
Hi Community,
I'm new to spark and have been struggling to execute a spark job which connects to a HBase table.
In YARN GUI I can see the spark job is getting to state Running but then it fails with below error :
18/07/01 21:08:20 INFO ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x6a8bcb640x0, quorum=localhost:2181, baseZNode=/hbase 18/07/01 21:08:20 INFO ClientCnxn: Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 18/07/01 21:08:20 WARN ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) 18/07/01 21:08:20 WARN RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
ZooKeeper is up and running. Below is my code
//Create table catalog object foo { def catalog = s"""{ |"table":{"namespace":"foo", "name":"bar"}, |"rowkey":"key", |"columns":{ |"col0":{"col":"key", "type":"string"}, |"col1":{"cf":"data", "col":"id", "type":"bigint"} |} |} "" ". stripMargin def main(args: Array[String]) { val spark = SparkSession.builder() .appName("foo") .getOrCreate() choose sc = kick.sparkContext val sqlContext = spark.sqlContext import sqlContext.implicits._ def withCatalog(cat: String): DataFrame = { sqlContext .read .options(Map(HBaseTableCatalog.tableCatalog -> cat)) .format("org.apache.spark.sql.execution.datasources.hbase") .load() } // Read from HBase table val df = withCatalog(catalog) df.show df.filter($"col0" === "1528801346000_200550232_2955") .select($"col0", $"col1").show spark.stop() }
I'll really appreciate any help on this. I couldnt find any convincing answer in stackoverflow as well.
Created 07-02-2018 04:18 PM
It seems that the app is not picking up the hbase-site.xml and is connecting to localhost (connectString=localhost:2181).
copy hbase-site.xml to /etc/spark/conf/ in the node where you are launching the job and also pass the hbase-site.xml using --files in spark-submit command (--files /etc/spark/conf/hbase-site.xml).
Created 07-02-2018 04:52 AM
Can you upload your hbase configuration file.
Created 07-02-2018 07:56 AM
@karthik nedunchezhiyan I have attached hbase-site.xml, I hope it helps.
Created 07-02-2018 04:18 PM
It seems that the app is not picking up the hbase-site.xml and is connecting to localhost (connectString=localhost:2181).
copy hbase-site.xml to /etc/spark/conf/ in the node where you are launching the job and also pass the hbase-site.xml using --files in spark-submit command (--files /etc/spark/conf/hbase-site.xml).
Created 07-03-2018 12:37 PM
Hi @Sandeep Nemuri thanks for answering. I did as mentioned above and now my code runs fine with deploy-mode as client, but now I ran into another issue where the data displayed is blank. However I can see the data on hbase shell prompt. I have already checked the rowkey,table name and namespace name. What do you think is missing ?
val df = withCatalog(catalog) df.show df.select($"col0").show(10) spark.stop()
Created 07-03-2018 12:45 PM
@vivek jain Glad that it worked, Would you mind marking this thread as closed by clicking on "Accept" and asking a new question with the code used and the console output.
Created 07-03-2018 02:58 PM
@Sandeep Nemuri thanks for the help, surely will do so.