Support Questions

Find answers, ask questions, and share your expertise

When the number of indexers of lily hbase indexer service is created to 200+, a large number of TIME_WAIT ports appear

avatar
New Contributor

91567-time-wait-副本.jpg

Version: CDH5.13.3, corresponding to hbase1.2.0

Nodes: Nine work nodes, three management nodes

Main role assignment: Nine regionserver, datanode, and solr server, three of which were lily hbase indexer

Background: Hbase for solr, solr stores the secondary index of hbase, and the index is automatically synchronized through lily hbase indexer

Question:

When creating the indexer number to 200+, we found a lot of TIME_WAIT port (almost 30000), in the regionserver log found this :"Retrying the connect to server: xx.xx.com/ipAddress:50020. Already tried 1 time (s); Retry the policy is RetryUpToMaximumCountWithFixedSleep (maxRetries = 10, sleepTime = 1000 MILLISECONDS)". when delete all the indexers , port back to normal, hbase back to normal Initial suspicion is that this version of hbase's multi-wal is in conflict with the replication functionality. Originally configured with 3 WAL, then we changed to single and re-created with 200+ indexers,although TIME_WAIT port down to about 10,000,but is still not solved,Can you give me some advice??

1 ACCEPTED SOLUTION

avatar
New Contributor

Hello,

I've been experiencing similar problem with large number of TIME_WAIT sockets. I knew it is related to replication, so I started researching replication options and found following:

I've set replication.source.sleepforretries to 1 according to these instructions:

https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/fault-tolerance/content/spreading_queue_fai...

They say it is 1 for 1 second, but if you look at the HBase source code, you'll see that it is milliseconds and should be 1000 for 1 second. After changing replication.source.sleepforretries from 1 to 1000 with replication enabled, the number of TIME_WAIT (TIME-WAIT) sockets dropped to normal value.

So check it, maybe you have set it at 1. And to Hortonworks, please fix the docs.

View solution in original post

1 REPLY 1

avatar
New Contributor

Hello,

I've been experiencing similar problem with large number of TIME_WAIT sockets. I knew it is related to replication, so I started researching replication options and found following:

I've set replication.source.sleepforretries to 1 according to these instructions:

https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/fault-tolerance/content/spreading_queue_fai...

They say it is 1 for 1 second, but if you look at the HBase source code, you'll see that it is milliseconds and should be 1000 for 1 second. After changing replication.source.sleepforretries from 1 to 1000 with replication enabled, the number of TIME_WAIT (TIME-WAIT) sockets dropped to normal value.

So check it, maybe you have set it at 1. And to Hortonworks, please fix the docs.