Support Questions

hanumanth · ‎12-05-2022

Hi Team,

Spark job is hang or struck due to below error. can any one please help here..

22/12/05 22:29:55 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x5e29988e0x0, quorum=localhost:2181, baseZNode=/hbase
22/12/05 22:29:55 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
22/12/05 22:29:55 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1126)

....

22/12/05 22:30:12 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
22/12/05 22:30:12 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts
22/12/05 22:30:12 WARN zookeeper.ZKUtil: hconnection-0x5e29988e0x0, quorum=localhost:2181, baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid

rki_ · ‎12-06-2022

Hi @hanumanth , As you can see, the spark job is trying to reach the zookeeper on the localhost.

22/12/05 22:30:12 WARN zookeeper.ZKUtil: hconnection-0x5e29988e0x0, quorum=localhost:2181

We expect a zookeeper quorum of 3 or more ZK under quorum=. So this indicates that the node on which the spark job is running doesn't have a hbase-site.xml to direct the job to use the hbase.zookeeper.quorum.

So make sure you have a Hbase Gateway role deployed on the node from where you are running the spark job and also try running the job if spark-submit using for eg "--files /etc/spark/conf/yarn-conf/hbase-site.xml"

View solution in original post

blizano · ‎12-05-2022

Hello @hanumanth ,

If Zookeeper services look up and running, you may need to compare the Spark job failure timestamp against Zookeeper logs from the Leader sever. if there is not a visible issue from Zookeeper side you can check if the hbase client configurations were applied properly in the spark job configurations.

Also, confirm that the Hbase service is up and functional as well.

If the above does not help, you may want to raise a support ticket with the Spark component.

hanumanth · ‎12-06-2022

Hi All,

As CDH version was 5.16.1 out of support , i am unable to contact support. please help here.

rki_ · ‎12-06-2022

Hi @hanumanth , As you can see, the spark job is trying to reach the zookeeper on the localhost.

22/12/05 22:30:12 WARN zookeeper.ZKUtil: hconnection-0x5e29988e0x0, quorum=localhost:2181

We expect a zookeeper quorum of 3 or more ZK under quorum=. So this indicates that the node on which the spark job is running doesn't have a hbase-site.xml to direct the job to use the hbase.zookeeper.quorum.

So make sure you have a Hbase Gateway role deployed on the node from where you are running the spark job and also try running the job if spark-submit using for eg "--files /etc/spark/conf/yarn-conf/hbase-site.xml"

DianaTorres · ‎12-07-2022

@hanumanth Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks

Regards,

Diana Torres,
Senior Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Support Questions

Hbase - Zookeeper connection issue