Created 02-05-2018 08:35 AM
In HDFS what is small file problem ?
Created 02-05-2018 06:41 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 02-05-2018 09:33 AM
The HDFS is a distributed file system. hadoop is mainly designed for batch processing of large volume of data. The default data block size of HDFS is 128 MB. When file size is significantly smaller than the block size the efficiency degrades.
Mainly there are two reasons for producing small files:
To understand HDFS block size in more detail, I'd recommend reviewing a few good stackoverflow questions:http://stackoverflow.com/questions/13012924/large-block-size-in-hdfs-how-is-the-unused-space-account...
http://stackoverflow.com/questions/19473772/data-block-size-in-hdfs-why-64mb
For your disk/filesystem recommendations take a look here:
Hope that all helps!
duplicate topic.
Created 02-05-2018 06:41 PM
Want to get a detailed solution you have to login/registered on the community
Register/Login