Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDFS maximum number of files?

avatar
Expert Contributor

Hi guys,

I see surfing the internet that the max numbers of files stored in HDFS is equals at Integer.MAX_VALUE of JVM.

Any confirmation that the maximum number of files is this (2.147.483.647)?

5 REPLIES 5

avatar

@Davide Isoardi

any file will be present inside a directory. As per the doc [1] the "dfs.namenode.fs-limits.max-directory-items" parameter defines the maximum number of items that a directory may contain.

https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

dfs.namenode.fs-limits.max-directory-items

1048576Defines the maximum number of items that a directory may contain. A value of 0 will disable the check.

avatar

The following piece of code explains the prediction:

https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache...

    // We need a maximum maximum because by default, PB limits message sizes    
    // to 64MB. This means we can only store approximately 6.7 million entries
    // per directory, but let's use 6.4 million for some safety.

    final int MAX_DIR_ITEMS = 64 * 100 * 1000;
    Preconditions.checkArgument(
        maxDirItems > 0 && maxDirItems <= MAX_DIR_ITEMS, "Cannot set "
            + DFSConfigKeys.DFS_NAMENODE_MAX_DIRECTORY_ITEMS_KEY
            + " to a value less than 1 or greater than " + MAX_DIR_ITEMS);

.

I do not see any "Integer.MAX_VALUE" limitation on the whole HDFS. That does not look right.

avatar
Explorer

I believe the only overall HDFS limit is determined by how much memory is available in the namenode.

avatar

@Davide Isoardi

Adding in @Tom Lyon update. You can refer to the following link to check the memory requirement for NameNode based on the number of files. Which talks up to "150-200" million files and the memory settings accordingly. However file count can grow more than that i guess. I did not find any such limitation documented yet.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_command-line-installation/content/ref-80...

avatar

The maximum number of files in HDFS depends on the amount of memory available for the NameNode.

Each file object and each block object takes about 150 bytes of the memory. For example, if you have 10 million files and each file has 1 one block each, then you would need about 3GB of memory for the NameNode.

If I had 10 million files, each using a block, then we would be using: 10 million + 10 million = 20 million * 150 = 3,000,000,000 bytes = 3 GB MEMORY. Keep in mind the NameNode will need memory for other processes. So to support 10 million files then your NameNode will need much more than 3GB of memory.