Support Questions

Find answers, ask questions, and share your expertise

HiveServer2 leaks TCP sockets to datanodes in Close_wait state

avatar
New Contributor

Hi all,

we are experiencing a progressive socket leak from the LLAP HiveServer2 daemon to HDFS DataNodes in a cluster running HDP 3.1.5

The fd grow constantly until the limit of 6400 and the only resolution is to restart hiveserver2Interactive:

[hive@myhost hive]$ netstat -tanp | grep "CLOSE_WAIT" | awk '{print $NF}' | awk -F'/' '{print $1}' | sort | uniq -c
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
1734 -
55443 2791
5 49167

[hive@myhost hive]$ ps -ef | grep 2791
hive 2791 1 31 Apr11 ? 6-01:44:32 /usr/jdk64/jdk1.8.0_112/bin/java -Dproc_jar -Dhdp.version=3.1.5.0-152 -Djava.net.preferIPv4Stack=true -Xloggc:/var/log/hive/hiveserverinteractive-gc-%t.log -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCCause -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/grid/0/tmp/hsi_heapdump.hprof -Dhive.log.dir=/var/log/hive -Dhive.log.file=hiveserver2Interactive.log -Dzookeeper.sasl.client.username=zookeeper -Dhdp.version=3.1.5.0-152 -Xmx4096m -Dproc_hiveserver2 -Xmx10240m -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.file=/usr/hdp/current/hive-server2/conf_llap//parquet-logging.properties -Dyarn.log.dir=/var/log/hadoop/hive -Dyarn.log.file=hadoop.log -Dyarn.home.dir=/usr/hdp/3.1.5.0-152/hadoop-yarn -Dyarn.root.logger=INFO,console -Djava.library.path=:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/hdp/3.1.5.0-152/hadoop/lib/native/Linux-amd64-64:/usr/hdp/current/hadoop-client/lib/native -Dhadoop.log.dir=/var/log/hadoop/hive -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/current/hadoop-client -Dhadoop.id.str=hive -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/hdp/3.1.5.0-152/hive/lib/hive-service-3.1.0.3.1.5.0-152.jar org.apache.hive.service.server.HiveServer2 --hiveconf hive.aux.jars.path=file:///usr/hdp/current/hive-server2/lib/hive-hcatalog-core.jar,file:--and other jar--

All connections originate from LLAP and target specific datanodes:

18302 ip1 1019
18441 ip2 1019
18708 ip3 1019

Initially, we suspected the issue could be caused by some UDF, especially those that interact with HDFS (e.g. spatial or crypto functions). We tested queries that force loading of JARs like but at the moment we were unable to reproduce the socket leak in our test cluster, even when using the same UDFs and query patterns.

Environment:

  • HDP version: 3.1.5
  • Hive: 3.1.0
  • JDK: 1.8.0_112

Any suggestion?

Thanks.

2 ACCEPTED SOLUTIONS

avatar
Master Collaborator

Hi @Lorenzo_F 

It could be due to the below bug -

https://issues.apache.org/jira/browse/HIVE-22981

You may need to reproduce the issue and take heap dump to confirm it.

View solution in original post

avatar
New Contributor

Hi @shubham_sharma ,

i've tried to reproduce the issue creating a test avro table, quering it i've found that generate close_wait socket.

Thanks a lot.

View solution in original post

3 REPLIES 3

avatar
Community Manager

@Lorenzo_F Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our Hive experts @cravani @james_jones @ggangadharan  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Senior Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Master Collaborator

Hi @Lorenzo_F 

It could be due to the below bug -

https://issues.apache.org/jira/browse/HIVE-22981

You may need to reproduce the issue and take heap dump to confirm it.

avatar
New Contributor

Hi @shubham_sharma ,

i've tried to reproduce the issue creating a test avro table, quering it i've found that generate close_wait socket.

Thanks a lot.