Created 12-05-2022 07:04 AM
Hi Team,
Spark job is hang or struck due to below error. can any one please help here..
22/12/05 22:29:55 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x5e29988e0x0, quorum=localhost:2181, baseZNode=/hbase
22/12/05 22:29:55 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
22/12/05 22:29:55 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1126)
....
22/12/05 22:30:12 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
22/12/05 22:30:12 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts
22/12/05 22:30:12 WARN zookeeper.ZKUtil: hconnection-0x5e29988e0x0, quorum=localhost:2181, baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
Created 12-06-2022 12:49 AM
Hi @hanumanth , As you can see, the spark job is trying to reach the zookeeper on the localhost.
22/12/05 22:30:12 WARN zookeeper.ZKUtil: hconnection-0x5e29988e0x0, quorum=localhost:2181
We expect a zookeeper quorum of 3 or more ZK under quorum=. So this indicates that the node on which the spark job is running doesn't have a hbase-site.xml to direct the job to use the hbase.zookeeper.quorum.
So make sure you have a Hbase Gateway role deployed on the node from where you are running the spark job and also try running the job if spark-submit using for eg "--files /etc/spark/conf/yarn-conf/hbase-site.xml"
Created 12-05-2022 11:11 AM
Hello @hanumanth ,
If Zookeeper services look up and running, you may need to compare the Spark job failure timestamp against Zookeeper logs from the Leader sever. if there is not a visible issue from Zookeeper side you can check if the hbase client configurations were applied properly in the spark job configurations.
Also, confirm that the Hbase service is up and functional as well.
If the above does not help, you may want to raise a support ticket with the Spark component.
Created 12-06-2022 12:26 AM
Hi All,
As CDH version was 5.16.1 out of support , i am unable to contact support. please help here.
Created 12-06-2022 12:49 AM
Hi @hanumanth , As you can see, the spark job is trying to reach the zookeeper on the localhost.
22/12/05 22:30:12 WARN zookeeper.ZKUtil: hconnection-0x5e29988e0x0, quorum=localhost:2181
We expect a zookeeper quorum of 3 or more ZK under quorum=. So this indicates that the node on which the spark job is running doesn't have a hbase-site.xml to direct the job to use the hbase.zookeeper.quorum.
So make sure you have a Hbase Gateway role deployed on the node from where you are running the spark job and also try running the job if spark-submit using for eg "--files /etc/spark/conf/yarn-conf/hbase-site.xml"
Created 12-07-2022 07:13 AM
@hanumanth Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks
Regards,
Diana Torres,