Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What is Small file problem in HDFS ?

avatar
Explorer

In HDFS what is small file problem ?

1 ACCEPTED SOLUTION

avatar
Expert Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
2 REPLIES 2

avatar
Expert Contributor

@Malay Sharma

The HDFS is a distributed file system. hadoop is mainly designed for batch processing of large volume of data. The default data block size of HDFS is 128 MB. When file size is significantly smaller than the block size the efficiency degrades.

Mainly there are two reasons for producing small files:

  • Files could be the piece of a larger logical file. Since HDFS has only recently supported appends, these unbounded files are saved by writing them in chunks into HDFS.
  • Another reason is some files cannot be combined together into one larger file and are essentially small. e.g. – A large corpus of images where each image is a distinct file.

To understand HDFS block size in more detail, I'd recommend reviewing a few good stackoverflow questions:http://stackoverflow.com/questions/13012924/large-block-size-in-hdfs-how-is-the-unused-space-account...

http://stackoverflow.com/questions/19473772/data-block-size-in-hdfs-why-64mb

For your disk/filesystem recommendations take a look here:

https://community.hortonworks.com/content/kbentry/14508/best-practices-linux-file-systems-for-hdfs.h...

Hope that all helps!

duplicate topic.

avatar
Expert Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login