05-13-2018 04:48 PM
When my Spark Stream application running more than 3 days, the thread count for one worker's executors are much more than others as below:
Then if I continue running it, the thread count for other worker will also be increased, all of the increment only happen once per day. My spark version is 1.6.0-cdh5.10.0, and use the standalone mode to do the resource manage.
I'm not sure whether the following operations cause it.
One application tries to connect to the redis in UDF, the execute interval is one minutes. So when the thread count is more than 10 thousands, it will throw the following exeception:
The redis clients reached the max limit.
The other application tries to connect to the hdfs in UDF, the execute interval is one minutes. So when the thread count is more than the specified limit, it will throw the following exeception:
java.io.FileNotFoundException: /opt/spark/tmp/spark-18dcb456-0abe-45a9-8e9f-bfef392266cf/executor-1f2c6a8d-92dc-4ca2-bda0-5bd428dabdf6/blockmgr-ea03774c-921a-460e-9aea-f8b230b5c1b6/3b/temp_shuffle_5dfa41c3-eb11-44c5-b8e0-3c0248a21f4a (Too many open files)