Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HBase Java client connection timeout

avatar
New Contributor

I have a Spring boot application using HBase Java client, version 2.1.0-cdh6.1.1, connect to a Hbase with kerberos. When running a tomcat instance in our server, a simple query could take 5 minutes to complete and timeout found in log file.  I can't find out what's wrong.

Connection configuration:

 

 

config.set("hbase.zookeeper.quorum",  quorum);
 config.set("hbase.zookeeper.clientPort", "2181");
config.set("hadoop.security.authentication", "kerberos");
config.set("hbase.security.authentication", "kerberos");
config.set("hbase.cluster.distributed", "true");
config.set("hbase.rpc.protection", "privacy");
config.set("hbase.regionserver.kerberos.principal", "hbase/_HOST@mycompany.com");
config.set("hbase.master.kerberos.principal", "hbase/_HOST@mycompany.com");
			
config.set("hbase.client.retries.number", "3");
config.set("hbase.client.pause", "500");
config.set("zookeeper.recovery.retry",  "1");    System.setProperty("java.security.krb5.conf",ResourceUtils.getFile("classpath:krb5.conf").getPath());
	        
File file = ResourceUtils.getFile(keytabFileLocation);

UserGroupInformation.setConfiguration(config);
UserGroupInformation.loginUserFromKeytab(userIdEmail, file.getPath());
Connection conn = ConnectionFactory.createConnection(config);

 

 

 

here is a one minute gap from 17:23:42,637 to 17:24:43,126 in log file, then the time out at the end.

How to fix this issue?

Thank you

 

 

 

 

 

 

 

 

 

 

 

 

 

 

7 REPLIES 7

avatar
Super Collaborator

Hello @bigdatanewbie 

 

Thanks for using Cloudera Community. Based on the Post, a Spring Boot Application fails to connect to HBase in a Kerberized Cluster. 

 

Looking at the Logs, We observe the RegionServer "fepp-cdhdn-d2.mycompany.com/172.29.233.141" isn't able to complete the RPC Request within 60 Seconds Timeout. With 3 retries, the failure being persisted causes the Overall App failure. The fact that the Application identifies the RegionServer hosting the Regions of Table "hbasepoc:alan_test" indicates the Client is able to fetch the Metadata (hbase:meta) Table's Region from the ZooKeeper & connect with RegionServer hosting "hbase:met" Region to pull the required Metadata information. 

 

Let's verify the Table "hbasepoc:alan_test" is Healthy by running an HBCK on the Table & using HBase Shell to perform the same Operation as being performed by the Spring Boot Application. If the HBCK Report on the table (Obtained via "hbase hbck -details hbasepoc:alan_test") shows no Inconsistency & HBase Shell access to the Table with the same Operation completes successfully, Reviewing the concerned Host (Wherein Spring Boot Application is running) connectivity with the HBase Setup along with RegionServer Logs would be helpful. 

 

Additionally, We can try increasing the Timeout or Retries to confirm the Issue lies with Delayed Response or any other Underlying issues.

 

- Smarak

avatar
New Contributor

Finally found out what's causing my problem.  My company's firewall block the traffic from the tomcat server to port 16020 on all region servers since they are on different subnets.  The Palo Alto firewall consider this kind of connection is "unknown_tcp".  Can it be changed?

avatar
Super Collaborator

Hello @bigdatanewbie 

 

Thanks for the response & sharing the reasoning for the RPC Connection being timed out. Unfortunately, I am not familiar with "unknown_tcp" Connection & reviewing the Palo Alto Site for the concerned topic reports few criterias, wherein a Connection can be termed as "Unknown" if the Connection doesn't have enough Header info or didn't match any Known Application behavior. Link [1] is a KB from Palo Alto on the same context & discuss the same, with the steps to review & mitigate the same (I am sure your Team have reviewed this KB). 

 

- Smarak

 

[1] https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000Clc6CAC

avatar
New Contributor

In the Cloudera support document, port 16020 on HBbase region server list with "IPC", thus I understand the connection is not an established network applications such as "HTTPS" or "SSH".   Our network team is reviewing the KB and decide what to do.  I am surprised that this issue has not been surfaced before, since lots of companies using Palo Alto products.

avatar
Super Collaborator

Hello @bigdatanewbie 

 

Thanks for the Comment. As you stated, the Port 16020 is the IPC Port for HBase. When a Client connects to HBase, the 1st Connection is made to the RegionServer holding "hbase:meta" Region. After fetching the Metadata details from the concerned RegionServer, the Client connects with the required RegionServers for the Read/Write Operations being performed by the End-User. Such Communication happens on Port 16020 as well. As such, Please review if the concerned Scenario was applicable for all Traffic between the Client Host & the RegionServer Host on Port 16020, wherein the Traffic is recognised as "Unknown_TCP". 

 

As you mentioned, It's surprising the concerned issue hasn't surfaced before as Palo Alto Network Product are widely used, yet I suspect the Firewall Setting may be to allow any Traffic on Port 16020, thereby ensuring the Type of Traffic isn't reviewed. 

 

As the concerned issue with your Client Connection to HBase is resolved, Kindly confirm if you have any further ask concerning the Post. If not, Kindly mark the Post as Resolved. 

 

Thanks for using Cloudera Community. 

 

- Smarak

avatar
New Contributor

I had a similar problem with CDP7 nodes interconnected by PA devices. You should not let PA manage those "unknown_tcp" traffic to HBase. You need to define an Application Override Policy between HBase and its clients. This will manage the traffic on HBase RS ports. This is especially true if traffic significantly changes during off-peak hours, where PA views the low traffic as suspicious and creates connection timeouts between HBase and its clients.

avatar
Community Manager

@Tyrone As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: