Created 04-23-2021 03:37 PM
I have a Spring boot application using HBase Java client, version 2.1.0-cdh6.1.1, connect to a Hbase with kerberos. When running a tomcat instance in our server, a simple query could take 5 minutes to complete and timeout found in log file. I can't find out what's wrong.
Connection configuration:
config.set("hbase.zookeeper.quorum", quorum);
config.set("hbase.zookeeper.clientPort", "2181");
config.set("hadoop.security.authentication", "kerberos");
config.set("hbase.security.authentication", "kerberos");
config.set("hbase.cluster.distributed", "true");
config.set("hbase.rpc.protection", "privacy");
config.set("hbase.regionserver.kerberos.principal", "hbase/_HOST@mycompany.com");
config.set("hbase.master.kerberos.principal", "hbase/_HOST@mycompany.com");
config.set("hbase.client.retries.number", "3");
config.set("hbase.client.pause", "500");
config.set("zookeeper.recovery.retry", "1"); System.setProperty("java.security.krb5.conf",ResourceUtils.getFile("classpath:krb5.conf").getPath());
File file = ResourceUtils.getFile(keytabFileLocation);
UserGroupInformation.setConfiguration(config);
UserGroupInformation.loginUserFromKeytab(userIdEmail, file.getPath());
Connection conn = ConnectionFactory.createConnection(config);
here is a one minute gap from 17:23:42,637 to 17:24:43,126 in log file, then the time out at the end.
How to fix this issue?
Thank you
Created 05-01-2021 11:46 PM
Hello @bigdatanewbie
Thanks for using Cloudera Community. Based on the Post, a Spring Boot Application fails to connect to HBase in a Kerberized Cluster.
Looking at the Logs, We observe the RegionServer "fepp-cdhdn-d2.mycompany.com/172.29.233.141" isn't able to complete the RPC Request within 60 Seconds Timeout. With 3 retries, the failure being persisted causes the Overall App failure. The fact that the Application identifies the RegionServer hosting the Regions of Table "hbasepoc:alan_test" indicates the Client is able to fetch the Metadata (hbase:meta) Table's Region from the ZooKeeper & connect with RegionServer hosting "hbase:met" Region to pull the required Metadata information.
Let's verify the Table "hbasepoc:alan_test" is Healthy by running an HBCK on the Table & using HBase Shell to perform the same Operation as being performed by the Spring Boot Application. If the HBCK Report on the table (Obtained via "hbase hbck -details hbasepoc:alan_test") shows no Inconsistency & HBase Shell access to the Table with the same Operation completes successfully, Reviewing the concerned Host (Wherein Spring Boot Application is running) connectivity with the HBase Setup along with RegionServer Logs would be helpful.
Additionally, We can try increasing the Timeout or Retries to confirm the Issue lies with Delayed Response or any other Underlying issues.
- Smarak
Created on 05-10-2021 11:15 AM - edited 05-10-2021 11:16 AM
Finally found out what's causing my problem. My company's firewall block the traffic from the tomcat server to port 16020 on all region servers since they are on different subnets. The Palo Alto firewall consider this kind of connection is "unknown_tcp". Can it be changed?
Created 05-11-2021 06:32 AM
Hello @bigdatanewbie
Thanks for the response & sharing the reasoning for the RPC Connection being timed out. Unfortunately, I am not familiar with "unknown_tcp" Connection & reviewing the Palo Alto Site for the concerned topic reports few criterias, wherein a Connection can be termed as "Unknown" if the Connection doesn't have enough Header info or didn't match any Known Application behavior. Link [1] is a KB from Palo Alto on the same context & discuss the same, with the steps to review & mitigate the same (I am sure your Team have reviewed this KB).
- Smarak
[1] https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000Clc6CAC
Created on 05-11-2021 08:31 AM - edited 05-11-2021 09:27 AM
In the Cloudera support document, port 16020 on HBbase region server list with "IPC", thus I understand the connection is not an established network applications such as "HTTPS" or "SSH". Our network team is reviewing the KB and decide what to do. I am surprised that this issue has not been surfaced before, since lots of companies using Palo Alto products.
Created 05-21-2021 02:03 AM
Hello @bigdatanewbie
Thanks for the Comment. As you stated, the Port 16020 is the IPC Port for HBase. When a Client connects to HBase, the 1st Connection is made to the RegionServer holding "hbase:meta" Region. After fetching the Metadata details from the concerned RegionServer, the Client connects with the required RegionServers for the Read/Write Operations being performed by the End-User. Such Communication happens on Port 16020 as well. As such, Please review if the concerned Scenario was applicable for all Traffic between the Client Host & the RegionServer Host on Port 16020, wherein the Traffic is recognised as "Unknown_TCP".
As you mentioned, It's surprising the concerned issue hasn't surfaced before as Palo Alto Network Product are widely used, yet I suspect the Firewall Setting may be to allow any Traffic on Port 16020, thereby ensuring the Type of Traffic isn't reviewed.
As the concerned issue with your Client Connection to HBase is resolved, Kindly confirm if you have any further ask concerning the Post. If not, Kindly mark the Post as Resolved.
Thanks for using Cloudera Community.
- Smarak
Created 10-19-2023 07:00 PM
I had a similar problem with CDP7 nodes interconnected by PA devices. You should not let PA manage those "unknown_tcp" traffic to HBase. You need to define an Application Override Policy between HBase and its clients. This will manage the traffic on HBase RS ports. This is especially true if traffic significantly changes during off-peak hours, where PA views the low traffic as suspicious and creates connection timeouts between HBase and its clients.
Created 10-20-2023 03:27 PM
@Tyrone As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.
Regards,
Diana Torres,