Created 02-15-2017 11:50 AM
Hi,
We just started using Cloudera Manager Express 5.9 (the same version for Namenodes and Datanodes) for our HDFS cluster. When our internal client is posting logs to HDFS , we see the usage of file descriptors in Datanodes continually climbing until it reaches to Warning level of 50% and ultimately cross the Critical threshold of 70% (default limits configured in the health tests) .. The only way to bring the usage down is to restart Data Nodes service on each of the Data Nodes. This is really disruptive to our usage of HDFS.
In the past we were using Cloudera Standard 4.6.2 , and with the same setup , we never saw the file descriptors usage so high.
I chcked the number of configured file descriptors in both 5.9 and 4.6.2 and it's the same 32k value.
Investigation report -:
I used ps -ef --cols 9999|grep hdfs to find hdfs pid. Then use /usr/sbin/lsof -p [pid]|wc –l to find how many files open. Here are changes:
Datanode1 Datanode2 Datanode3 numbers -:
Before send 16490 16490 16486
After send 16580 16580 16576
java 23757 hdfs 593r REG 202,81 59 29658600 /opt/dfs/dn/current/BP-832824084-10.189.101.91-1484719231606/current/finalized/subdir0/subdir148/blk_1073779777_70965.metaIn all three data nodes, there are many open files like this:
java 23757 hdfs 594r REG 202,81 519 29658617 /opt/dfs/dn/current/BP-832824084-10.189.101.91-1484719231606/current/finalized/subdir0/subdir148/blk_1073779790_71007.meta
java 23757 hdfs 595w REG 202,81 119 29658629 /opt/dfs/dn/current/BP-832824084-10.189.101.91-1484719231606/current/finalized/subdir0/subdir148/blk_1073779801_71047.meta
We had the same situation even couple of hours later, and the open file descriptors did not decrease.
Has someone else seen the same problem and has a solution to this ? We will be really grateful for your support.
Please let me know if you have any questions.
Thanks,
Raj
Created 02-17-2017 12:27 AM
Created 02-17-2017 12:27 AM
Created 02-25-2017 06:12 PM
Hi,
I havent seen the file descriptors rising ever since i opened this ticket .. Feel free to close this ticket ..
Thanks for the suggestions though 🙂
Thanks, Raj
Created 03-18-2017 07:41 AM
This is happening again ..
We have now 4 large machines handling 1/6th of the load similarly 4 data nodes of 4.6.2 version and we did not see file descriptors climbing there , so it has something to do with 5.9 version itself ..
Can someone please confirm next course of action in this case ?
Looking forward to your response.
Thanks,
Raj
Created 03-20-2017 03:20 PM
We have the same issue.
We upgraded from 2.6.0 CDH 5.7.6 to 2.6.0 CDH 5.9.1.
Since then, our data nodes have been leaking open file descriptors to block .meta files.
We didn't have any issues before the upgrade.
The screen shot attached shows the change in behavior after the upgrade for one of our data nodes.
The drop downs occur when we restart the HDFS service.
Created 03-21-2017 05:53 AM
Downgrading from 2.6.0-cdh5.9.1 back to 2.6.0-cdh5.8.4 looks to have fixed the problem.
Our HDFS is back to being usable and stable.
Created 03-21-2017 11:31 AM
Hi nmous,
How did you downgrade from 5.9 to 5.8.4 ? Can you please tell me is there's a link for the documentation ?
Looking forward to your response.
Thanks, Raj
Created 03-22-2017 07:57 AM
We are only runnng hdfs, so we only need to upgrade that.
Since it was a dev environment, we shut all of hdfs down, download
hadoop-2.6.0-cdh5.8.4.tar.gz from http://archive.cloudera.com/cdh5/cdh/5/
and run with that.
(We are actually running with hdfs on mesos, so the artifacts get packaged up into an uberjar with the mesos executor, but there's no real magic there. I think it just uses the stuff in hadoop/common and hadoop/hdfs and some of the run scripts.)