Created 04-30-2025 07:56 AM
Hi all,
we are experiencing a progressive socket leak from the LLAP HiveServer2 daemon to HDFS DataNodes in a cluster running HDP 3.1.5
The fd grow constantly until the limit of 6400 and the only resolution is to restart hiveserver2Interactive:
[hive@myhost hive]$ netstat -tanp | grep "CLOSE_WAIT" | awk '{print $NF}' | awk -F'/' '{print $1}' | sort | uniq -c
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
1734 -
55443 2791
5 49167
[hive@myhost hive]$ ps -ef | grep 2791
hive 2791 1 31 Apr11 ? 6-01:44:32 /usr/jdk64/jdk1.8.0_112/bin/java -Dproc_jar -Dhdp.version=3.1.5.0-152 -Djava.net.preferIPv4Stack=true -Xloggc:/var/log/hive/hiveserverinteractive-gc-%t.log -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCCause -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/grid/0/tmp/hsi_heapdump.hprof -Dhive.log.dir=/var/log/hive -Dhive.log.file=hiveserver2Interactive.log -Dzookeeper.sasl.client.username=zookeeper -Dhdp.version=3.1.5.0-152 -Xmx4096m -Dproc_hiveserver2 -Xmx10240m -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.file=/usr/hdp/current/hive-server2/conf_llap//parquet-logging.properties -Dyarn.log.dir=/var/log/hadoop/hive -Dyarn.log.file=hadoop.log -Dyarn.home.dir=/usr/hdp/3.1.5.0-152/hadoop-yarn -Dyarn.root.logger=INFO,console -Djava.library.path=:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/hdp/3.1.5.0-152/hadoop/lib/native/Linux-amd64-64:/usr/hdp/current/hadoop-client/lib/native -Dhadoop.log.dir=/var/log/hadoop/hive -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/current/hadoop-client -Dhadoop.id.str=hive -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/hdp/3.1.5.0-152/hive/lib/hive-service-3.1.0.3.1.5.0-152.jar org.apache.hive.service.server.HiveServer2 --hiveconf hive.aux.jars.path=file:///usr/hdp/current/hive-server2/lib/hive-hcatalog-core.jar,file:--and other jar--
All connections originate from LLAP and target specific datanodes:
18302 ip1 1019
18441 ip2 1019
18708 ip3 1019
Initially, we suspected the issue could be caused by some UDF, especially those that interact with HDFS (e.g. spatial or crypto functions). We tested queries that force loading of JARs like but at the moment we were unable to reproduce the socket leak in our test cluster, even when using the same UDFs and query patterns.
Environment:
Any suggestion?
Thanks.
Created 05-01-2025 05:58 AM
Hi @Lorenzo_F
It could be due to the below bug -
https://issues.apache.org/jira/browse/HIVE-22981
You may need to reproduce the issue and take heap dump to confirm it.
Created 05-02-2025 01:51 AM
Hi @shubham_sharma ,
i've tried to reproduce the issue creating a test avro table, quering it i've found that generate close_wait socket.
Thanks a lot.
Created 04-30-2025 10:11 AM
@Lorenzo_F Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our Hive experts @cravani @james_jones @ggangadharan who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created 05-01-2025 05:58 AM
Hi @Lorenzo_F
It could be due to the below bug -
https://issues.apache.org/jira/browse/HIVE-22981
You may need to reproduce the issue and take heap dump to confirm it.
Created 05-02-2025 01:51 AM
Hi @shubham_sharma ,
i've tried to reproduce the issue creating a test avro table, quering it i've found that generate close_wait socket.
Thanks a lot.