Reply
Highlighted
New Contributor
Posts: 2
Registered: ‎03-21-2016

Error when read hbase within spark rdd foreach action

I am trying to use spark streaming to consume a kafka message queue, within the foreachRdd operation, I am trying to read hbase within a rdd foreach action according to the messages that taking out from kafka. But I am getting some errors, looks like there are some dead lock. Anybody can help me figure out what's the issue here?
Below is the code details.

KafkaUtils.createStream(streamingContext,kafkazk,kafkaGroup,kafkaTopic.split(",").map(_.trim -> 1).toMap)
.foreachRDD(_.processRDD(new CQRequestContext(null,new DBRequestContext(prefix,dbName,tableName,null,null,null))))
streamingContext.start()
streamingContext.awaitTermination()

I am trying to do some hbase read with in each rdd foreach action, I am trying to create a hbase connection inthe foreachPartition action, in order to reduce the connection numbers.

def processRDD(cQRequestContext: CQRequestContext) = {
rdd.foreachPartition(iterator=>{
val connection = ConnectionFactory.createConnection(HbaseUtil.getHbaseConfiguration("FdsCQ"));
val hTable= connection.getTable(TableName.valueOf("cqdev:sampleTestTable"))

try {
iterator.foreach(requestTuple =>{
//processAIPRequest(requestTuple._2,cQRequestContext,myTable)

val p = new Put(Bytes.toBytes("dabao12345"))
p.addColumn(Bytes.toBytes(SellerConstants.CQ_ABUSE_RESULT.COLUMN_FAMILY), Bytes.toBytes(SellerConstants.CQ_ABUSE_RESULT.RESPONSE_CONTENT), Bytes.toBytes("content"))
p.addColumn(Bytes.toBytes(SellerConstants.CQ_ABUSE_RESULT.COLUMN_FAMILY), Bytes.toBytes(SellerConstants.CQ_ABUSE_RESULT.PACKAGE_NOS), Bytes.toBytes("response"))
p.addColumn(Bytes.toBytes(SellerConstants.CQ_ABUSE_RESULT.COLUMN_FAMILY), Bytes.toBytes(SellerConstants.CQ_ABUSE_RESULT.ABUSE_PACKAGES_NO), Bytes.toBytes("response"))
p.addColumn(Bytes.toBytes(SellerConstants.CQ_ABUSE_RESULT.COLUMN_FAMILY), Bytes.toBytes(SellerConstants.CQ_ABUSE_RESULT.RESPONSE_CODE), Bytes.toBytes("responseCode"))
p.addColumn(Bytes.toBytes(SellerConstants.CQ_ABUSE_RESULT.COLUMN_FAMILY), Bytes.toBytes(SellerConstants.CQ_ABUSE_RESULT.RESPONSE_STATUS), Bytes.toBytes("respone"))
hTable.put(p)

})

} finally {
//scanner.close
hTable.close()
connection.close()
}


})
}

And finally, I am getting below error, After debug I found this is not something like running out of connection from hbase, During testing I only put one message into kafka, and when the first time I am trying to create the hbase connection, I got the below errors.

java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:503)
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:222)
org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:481)
org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:86)
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:849)
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:670)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:526)
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
com.XXXXXXXX.func.CQStreamingFunction$$anonfun$processRDD$1.apply(CQStreamingFunction.scala:50)
com.XXXXXXXX.func.CQStreamingFunction$$anonfun$processRDD$1.apply(CQStreamingFunction.scala:49)
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:898)
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:898)
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
org.apache.spark.scheduler.Task.run(Task.scala:88)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)

Further more the above error only happens, when I try to run my spark streaming job in yarn-cluster mode in a yarn cluster, When I run in local mode in my local machine and connect to the remote hbase, everything works fine.

Env Details: CDH 5.5.2 with spark 1.5.0

Announcements