Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

effective way to store image files, pdf files in hdfs as sequence format using nifi

avatar
Expert Contributor

Currently working on a POC to effectively store image files or pdf files in hdfs as sequence format may be. In hdfs as there is a block size of 64mb lets say if i want to store couple of images whose size is 2mb each then i ll be wasting 60mb block size. So iam trying to come up with a way to effectively store small image files or pdf files in hdfs without wasting block size. Also please let me know whether we can ingest these files into hdfs using apache nifi and if so which processors would be best to use. thanks

13 REPLIES 13

avatar
Master Mentor

@surender nath reddy kudumula has this been resolved? Can you post your solution or accept best answer?

avatar
Explorer

The parameter you want to pass in is -Ddfs.block.size=<value in Bytes> This will set the block size to the desired amount for the transfer.

avatar

Hi guys,

I want to avoid a confusion on the block size and storage usage in this post.

@surender nath reddy kudumula when you say "In hdfs as there is a block size of 64mb lets say if i want to store couple of images whose size is 2mb each then i ll be wasting 60mb block size" in your question, I understand that you loose storage capacity. This is not the case in HDFS: a file smaller than a single block does not occupy a full block’s worth of storage so there's no storage wasting.

The problem with small files is the impact on processing performance. This is why you should use Sequence Files, HAR, HBase or merging solutions. You can read more on this aspect here: https://community.hortonworks.com/questions/4024/how-many-files-is-too-many-on-a-modern-hdp-cluster....

avatar
New Contributor

Have you figured out the solution yet? Would you mind to share with us. I got the same problem with POC project. Thanks

,

Have you figured out the solution? Would you mind to share with us about your solution. Thanks