Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What is Small file problem in HDFS ?

avatar
Explorer

In HDFS what is small file problem ?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

HDFS is very good at caching all file names and block addresses in memory. This is done at the Namenode level. This makes HDFS incredibly fast. So if you make modifications to the file system or read a file location, all of this can be served with no disk I/Os.

This design choice of keeping all metadata in memory at all times has certain trade-offs. One of them is that we need to spend a couple of bytes (think 100s of bytes) per file and blocks.

This leads to an issue in HDFS -- when you have a file system with 500 million to 700 million -- the amount of RAM that needs to be reserved by the Namenode becomes large. Typically, in sizes of 256 GB or more. At this size, the JVM is hard at work too; since it has to do things like garbage collection. There is also another dimension to this when you have 700 million files, it is quite possible that your cluster is serving 30-40K or more requests per second. This also creates lots of memory churn.

So a large number of files, combined with lots of file system requests makes Namenode a bottleneck in HDFS or in other words, the metadata that we need to keep in memory creates a bottleneck in HDFS.

There are several solutions / work in progress to address this problem --

  • HDFS federation -- That is being shipped as part of HDP 3.0, allows many names nodes to work against a set of Datanodes.
  • In HDFS-7240, where we are trying to separate the block space from the namespace, that allows us to immediately double or quadruple the effective size of the cluster.
  • Here is good document that tracks various issues and different approaches to scaling the namenode -- Uber scaling namenode
  • There is another approach where we try to send the read workloads to the secondary namenode, freeing up the active namenode and thus scaling it better. That work is tracked in Consistent Reads from Standby Node

Please let me know if you have any other questions.

Thanks
Anu

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

@Malay Sharma

The HDFS is a distributed file system. hadoop is mainly designed for batch processing of large volume of data. The default data block size of HDFS is 128 MB. When file size is significantly smaller than the block size the efficiency degrades.

Mainly there are two reasons for producing small files:

  • Files could be the piece of a larger logical file. Since HDFS has only recently supported appends, these unbounded files are saved by writing them in chunks into HDFS.
  • Another reason is some files cannot be combined together into one larger file and are essentially small. e.g. – A large corpus of images where each image is a distinct file.

To understand HDFS block size in more detail, I'd recommend reviewing a few good stackoverflow questions:http://stackoverflow.com/questions/13012924/large-block-size-in-hdfs-how-is-the-unused-space-account...

http://stackoverflow.com/questions/19473772/data-block-size-in-hdfs-why-64mb

For your disk/filesystem recommendations take a look here:

https://community.hortonworks.com/content/kbentry/14508/best-practices-linux-file-systems-for-hdfs.h...

Hope that all helps!

duplicate topic.

avatar
Expert Contributor

HDFS is very good at caching all file names and block addresses in memory. This is done at the Namenode level. This makes HDFS incredibly fast. So if you make modifications to the file system or read a file location, all of this can be served with no disk I/Os.

This design choice of keeping all metadata in memory at all times has certain trade-offs. One of them is that we need to spend a couple of bytes (think 100s of bytes) per file and blocks.

This leads to an issue in HDFS -- when you have a file system with 500 million to 700 million -- the amount of RAM that needs to be reserved by the Namenode becomes large. Typically, in sizes of 256 GB or more. At this size, the JVM is hard at work too; since it has to do things like garbage collection. There is also another dimension to this when you have 700 million files, it is quite possible that your cluster is serving 30-40K or more requests per second. This also creates lots of memory churn.

So a large number of files, combined with lots of file system requests makes Namenode a bottleneck in HDFS or in other words, the metadata that we need to keep in memory creates a bottleneck in HDFS.

There are several solutions / work in progress to address this problem --

  • HDFS federation -- That is being shipped as part of HDP 3.0, allows many names nodes to work against a set of Datanodes.
  • In HDFS-7240, where we are trying to separate the block space from the namespace, that allows us to immediately double or quadruple the effective size of the cluster.
  • Here is good document that tracks various issues and different approaches to scaling the namenode -- Uber scaling namenode
  • There is another approach where we try to send the read workloads to the secondary namenode, freeing up the active namenode and thus scaling it better. That work is tracked in Consistent Reads from Standby Node

Please let me know if you have any other questions.

Thanks
Anu