Reply
Explorer
Posts: 6
Registered: ‎02-17-2019
Accepted Solution

Efficient ways to store many images files

We have ten millions image and video files, are looking for efficient ways to store them in Hadoop (HDFS ...), and analyze them with tools available in the Hadoop ecosystem. I understand HDFS prefer big files. These image files are small, they are under ten megabytes. Please advise. Thanks very much!

 

 

Posts: 1,896
Kudos: 433
Solutions: 303
Registered: ‎07-31-2013

Re: Efficient ways to store many images files

You can do this via two methods: Container files, or HBase MOBs. Which is
the right path depends on your eventual, dominant read pattern for this
data.

If your analysis will require loading up only a small range of images out
of the total dataset, or individual images, then HBase is a better fit with
its key based access model, columnar storage and caches.

If instead you will require processing these images in bulk, then large
container files (such as Sequence Files (with BytesWritable or equivalent),
Parquet Files (with BINARY/BYTE_ARRAY types), etc. that can store multiple
images into a single file, and allow for fast, sequential reads of all
images in bulk.
Explorer
Posts: 6
Registered: ‎02-17-2019

Re: Efficient ways to store many images files

Thanks a lot for your reply Harsh. These sound great. Can you give some pointers to some learning materials on both methods, i.e. examples, blogs, URLs or books etc?  

 

 

Highlighted
Posts: 1,896
Kudos: 433
Solutions: 303
Registered: ‎07-31-2013

Re: Efficient ways to store many images files

Announcements