- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
effective way to store image files, pdf files in hdfs as sequence format using nifi
- Labels:
-
Apache NiFi
-
HDFS
Created on ‎01-20-2016 10:26 AM - edited ‎09-16-2022 02:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Currently working on a POC to effectively store image files or pdf files in hdfs as sequence format may be. In hdfs as there is a block size of 64mb lets say if i want to store couple of images whose size is 2mb each then i ll be wasting 60mb block size. So iam trying to come up with a way to effectively store small image files or pdf files in hdfs without wasting block size. Also please let me know whether we can ingest these files into hdfs using apache nifi and if so which processors would be best to use. thanks
Created ‎01-20-2016 06:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try to pass property to set block size to smaller size when writing the files? Maybe when you use nifi you can merge content? Compress a few images into one large zip before writing tp hdfs? Interesting qurstion. @surender nath reddy kudumula
Created ‎01-20-2016 07:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Artem Ervits sounds good... how about also using sequence file format before merging.. or i beleive storing in zip or bzip format would be the effective storage i guess. so we can store any file formats not just jpeg or png's effectively without wasting block size space or disk space in zip or bzip format.. Am i correct. I heard MAPR has some other file system which is best used for storing small files compared to hortonworks. How about zip files which are larger than 64mb are these splitted in hdfs or in Nifi we write a processor so that the zip files wont exceed 64mb???
Created ‎01-20-2016 08:20 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@surender nath reddy kudumula can't comment on MapR but if you achieve what you're planning, it will be a great candidate for article on this site. Here are some sample nifi templates.
Created ‎01-20-2016 10:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks for your reply @Artem Ervits will implement the poc thanks
Created ‎01-20-2016 07:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
just wondering once the zip files are in hdfs and if they are streaming into hdfs using nifi as zip format i believe we need a way to automate the unzip process and analyse the files stored in zip folder.. Any ideas how we can acheive this please?? thank you
Created ‎01-20-2016 08:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@surender nath reddy kudumula you can execute shell commands in nifi processor to achieve that.
Created ‎01-20-2016 07:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This may not answer the image size question but may give you some ideas on your POC. The images are stored in HBASE and processed.
See A Non Standard Use Case of Hadoop High Scale Image Processing and Analysis by TrueCar
The slides are at Hadoop Image Processing Pipeline
Created ‎01-20-2016 07:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks @Ancil McBarnett. will have a look..:)
Created ‎11-16-2016 08:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I want to process images under hadoop but I do not know in what format it will be easy !!, is it better to store them in sequence file ? or in Hbase ?, ....? knowing that I will process These images by c ++ programs that call opencv and ffmpeg . @surender nath reddy kudumula , @Ancil McBarnett
