Support Questions

rajpalparyani · ‎02-15-2017

Hi,

We just started using Cloudera Manager Express 5.9 (the same version for Namenodes and Datanodes) for our HDFS cluster. When our internal client is posting logs to HDFS , we see the usage of file descriptors in Datanodes continually climbing until it reaches to Warning level of 50% and ultimately cross the Critical threshold of 70% (default limits configured in the health tests) .. The only way to bring the usage down is to restart Data Nodes service on each of the Data Nodes. This is really disruptive to our usage of HDFS.

In the past we were using Cloudera Standard 4.6.2 , and with the same setup , we never saw the file descriptors usage so high.

I chcked the number of configured file descriptors in both 5.9 and 4.6.2 and it's the same 32k value.

Investigation report -:

I used ps -ef --cols 9999|grep hdfs to find hdfs pid. Then use /usr/sbin/lsof -p [pid]|wc –l to find how many files open. Here are changes:

Datanode1 Datanode2 Datanode3 numbers -:

Before send 16490 16490 16486

After send 16580 16580 16576

java 23757 hdfs 593r REG 202,81 59 29658600 /opt/dfs/dn/current/BP-832824084-10.189.101.91-1484719231606/current/finalized/subdir0/subdir148/blk_1073779777_70965.metaIn all three data nodes, there are many open files like this:

java 23757 hdfs 594r REG 202,81 519 29658617 /opt/dfs/dn/current/BP-832824084-10.189.101.91-1484719231606/current/finalized/subdir0/subdir148/blk_1073779790_71007.meta

java 23757 hdfs 595w REG 202,81 119 29658629 /opt/dfs/dn/current/BP-832824084-10.189.101.91-1484719231606/current/finalized/subdir0/subdir148/blk_1073779801_71047.meta

We had the same situation even couple of hours later, and the open file descriptors did not decrease.

Has someone else seen the same problem and has a solution to this ? We will be really grateful for your support.

Please let me know if you have any questions.

Thanks,

Raj

mbigelow · ‎02-17-2017

Are the number of datanodes the same? Is the block size the same? How many blocks are on each cluster?

The *.meta files are metadata files for the blocks. This may have been a change compared to Hadoop 1; I am not sure.

It is a bit weird for it to never go down though. I have a cluster with millions of blocks and hundreds of TBs and I'll get spikes but the open FDs are on average around 2k per nodes.

It does depend on how much work the DNs are under as well.

Can you increase the FD limits?

View solution in original post

mbigelow · ‎02-17-2017

Are the number of datanodes the same? Is the block size the same? How many blocks are on each cluster?

The *.meta files are metadata files for the blocks. This may have been a change compared to Hadoop 1; I am not sure.

It is a bit weird for it to never go down though. I have a cluster with millions of blocks and hundreds of TBs and I'll get spikes but the open FDs are on average around 2k per nodes.

It does depend on how much work the DNs are under as well.

Can you increase the FD limits?

rajpalparyani · ‎02-25-2017

Hi,

I havent seen the file descriptors rising ever since i opened this ticket .. Feel free to close this ticket ..

Thanks for the suggestions though 🙂

Thanks, Raj

rajpalparyani · ‎03-18-2017

This is happening again ..

We have now 4 large machines handling 1/6th of the load similarly 4 data nodes of 4.6.2 version and we did not see file descriptors climbing there , so it has something to do with 5.9 version itself ..

Can someone please confirm next course of action in this case ?

Looking forward to your response.

Thanks,

Raj

nymous · ‎03-20-2017

We have the same issue.
We upgraded from 2.6.0 CDH 5.7.6 to 2.6.0 CDH 5.9.1.
Since then, our data nodes have been leaking open file descriptors to block .meta files.
We didn't have any issues before the upgrade.
The screen shot attached shows the change in behavior after the upgrade for one of our data nodes.
The drop downs occur when we restart the HDFS service.

nymous · ‎03-21-2017

Downgrading from 2.6.0-cdh5.9.1 back to 2.6.0-cdh5.8.4 looks to have fixed the problem.

Our HDFS is back to being usable and stable.

rajpalparyani · ‎03-21-2017

Hi nmous,

How did you downgrade from 5.9 to 5.8.4 ? Can you please tell me is there's a link for the documentation ?

Looking forward to your response.

Thanks, Raj

nymous · ‎03-22-2017

We are only runnng hdfs, so we only need to upgrade that.

Since it was a dev environment, we shut all of hdfs down, download

hadoop-2.6.0-cdh5.8.4.tar.gz from http://archive.cloudera.com/cdh5/cdh/5/

and run with that.

(We are actually running with hdfs on mesos, so the artifacts get packaged up into an uberjar with the mesos executor, but there's no real magic there. I think it just uses the stuff in hadoop/common and hadoop/hdfs and some of the run scripts.)

Cloudera Community

Support Questions

File Descriptor usage in Datanode climbing steadily