I have a Spring boot application using HBase Java client, version 2.1.0-cdh6.1.1, connect to a Hbase with kerberos. When running a tomcat instance in our server, a simple query could take 5 minutes to complete and timeout found in log file. I can't find out what's wrong.
config.set("hbase.zookeeper.quorum", quorum); config.set("hbase.zookeeper.clientPort", "2181"); config.set("hadoop.security.authentication", "kerberos"); config.set("hbase.security.authentication", "kerberos"); config.set("hbase.cluster.distributed", "true"); config.set("hbase.rpc.protection", "privacy"); config.set("hbase.regionserver.kerberos.principal", "hbase/_HOST@mycompany.com"); config.set("hbase.master.kerberos.principal", "hbase/_HOST@mycompany.com"); config.set("hbase.client.retries.number", "3"); config.set("hbase.client.pause", "500"); config.set("zookeeper.recovery.retry", "1"); System.setProperty("java.security.krb5.conf",ResourceUtils.getFile("classpath:krb5.conf").getPath()); File file = ResourceUtils.getFile(keytabFileLocation); UserGroupInformation.setConfiguration(config); UserGroupInformation.loginUserFromKeytab(userIdEmail, file.getPath()); Connection conn = ConnectionFactory.createConnection(config);
here is a one minute gap from 17:23:42,637 to 17:24:43,126 in log file, then the time out at the end.
How to fix this issue?
Thanks for using Cloudera Community. Based on the Post, a Spring Boot Application fails to connect to HBase in a Kerberized Cluster.
Looking at the Logs, We observe the RegionServer "fepp-cdhdn-d2.mycompany.com/172.29.233.141" isn't able to complete the RPC Request within 60 Seconds Timeout. With 3 retries, the failure being persisted causes the Overall App failure. The fact that the Application identifies the RegionServer hosting the Regions of Table "hbasepoc:alan_test" indicates the Client is able to fetch the Metadata (hbase:meta) Table's Region from the ZooKeeper & connect with RegionServer hosting "hbase:met" Region to pull the required Metadata information.
Let's verify the Table "hbasepoc:alan_test" is Healthy by running an HBCK on the Table & using HBase Shell to perform the same Operation as being performed by the Spring Boot Application. If the HBCK Report on the table (Obtained via "hbase hbck -details hbasepoc:alan_test") shows no Inconsistency & HBase Shell access to the Table with the same Operation completes successfully, Reviewing the concerned Host (Wherein Spring Boot Application is running) connectivity with the HBase Setup along with RegionServer Logs would be helpful.
Additionally, We can try increasing the Timeout or Retries to confirm the Issue lies with Delayed Response or any other Underlying issues.
Finally found out what's causing my problem. My company's firewall block the traffic from the tomcat server to port 16020 on all region servers since they are on different subnets. The Palo Alto firewall consider this kind of connection is "unknown_tcp". Can it be changed?
Thanks for the response & sharing the reasoning for the RPC Connection being timed out. Unfortunately, I am not familiar with "unknown_tcp" Connection & reviewing the Palo Alto Site for the concerned topic reports few criterias, wherein a Connection can be termed as "Unknown" if the Connection doesn't have enough Header info or didn't match any Known Application behavior. Link  is a KB from Palo Alto on the same context & discuss the same, with the steps to review & mitigate the same (I am sure your Team have reviewed this KB).
In the Cloudera support document, port 16020 on HBbase region server list with "IPC", thus I understand the connection is not an established network applications such as "HTTPS" or "SSH". Our network team is reviewing the KB and decide what to do. I am surprised that this issue has not been surfaced before, since lots of companies using Palo Alto products.
Thanks for the Comment. As you stated, the Port 16020 is the IPC Port for HBase. When a Client connects to HBase, the 1st Connection is made to the RegionServer holding "hbase:meta" Region. After fetching the Metadata details from the concerned RegionServer, the Client connects with the required RegionServers for the Read/Write Operations being performed by the End-User. Such Communication happens on Port 16020 as well. As such, Please review if the concerned Scenario was applicable for all Traffic between the Client Host & the RegionServer Host on Port 16020, wherein the Traffic is recognised as "Unknown_TCP".
As you mentioned, It's surprising the concerned issue hasn't surfaced before as Palo Alto Network Product are widely used, yet I suspect the Firewall Setting may be to allow any Traffic on Port 16020, thereby ensuring the Type of Traffic isn't reviewed.
As the concerned issue with your Client Connection to HBase is resolved, Kindly confirm if you have any further ask concerning the Post. If not, Kindly mark the Post as Resolved.
Thanks for using Cloudera Community.