Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Efficient ways to store many images files

Solved Go to solution

Efficient ways to store many images files

Explorer

We have ten millions image and video files, are looking for efficient ways to store them in Hadoop (HDFS ...), and analyze them with tools available in the Hadoop ecosystem. I understand HDFS prefer big files. These image files are small, they are under ten megabytes. Please advise. Thanks very much!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Efficient ways to store many images files

Master Guru
3 REPLIES 3

Re: Efficient ways to store many images files

Master Guru
You can do this via two methods: Container files, or HBase MOBs. Which is
the right path depends on your eventual, dominant read pattern for this
data.

If your analysis will require loading up only a small range of images out
of the total dataset, or individual images, then HBase is a better fit with
its key based access model, columnar storage and caches.

If instead you will require processing these images in bulk, then large
container files (such as Sequence Files (with BytesWritable or equivalent),
Parquet Files (with BINARY/BYTE_ARRAY types), etc. that can store multiple
images into a single file, and allow for fast, sequential reads of all
images in bulk.

Re: Efficient ways to store many images files

Explorer

Thanks a lot for your reply Harsh. These sound great. Can you give some pointers to some learning materials on both methods, i.e. examples, blogs, URLs or books etc?  

 

 

Highlighted

Re: Efficient ways to store many images files

Master Guru
Don't have an account?
Coming from Hortonworks? Activate your account here