Options
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Master Guru
Created on 07-21-2016 10:13 PM - edited 09-16-2022 01:35 AM
Using the GetHTTP Processor we grab random images from the DigitalOcean's Unsplash.it free image site. I give it a random file name so we can save it uniquely in HDFS.
The Entire Data Flow from GetHTTP to Final HDFS storage of image and it's metadata as JSON.
ExtractMediaMetaData Processor
The final results:
hdfs dfs -cat /mediametadata/random1469112881039.json {"Number of Components":"3","Resolution Units":"none","Image Height":"200 pixels","File Name":"apache-tika-3181704319795384377.tmp", "Data Precision":"8 bits", "File Modified Date":"Thu Jul 21 14:54:43 UTC 2016","tiff:BitsPerSample":"8", "Compression Type":"Progressive,Huffman","X-Parsed-By":"org.apache.tika.parser.DefaultParser, org.apache.tika.parser.jpeg.JpegParser", "Component 1":"Y component: Quantization table 0, Sampling factors 2 horiz/2vert", "Component 2":"Cb component: Quantization table 1,Sampling factors 1 horiz/1 vert", "tiff:ImageLength":"200","mime.type":"image/jpeg","gethttp.remote.source":"unsplash.it", "Component3":"Cr component: Quantization table 1, Sampling factors 1 horiz/1vert", "X Resolution":"1 dot", "FileSize":"4701 bytes","tiff:ImageWidth":"200","path":"./", "filename":"random1469112881039.jpg","ImageWidth":"200 pixels", "uuid":"8b7c4f9f-9436-4ccb-b06e-9a720c91f6e0", "Content-Type":"image/jpeg", "YResolution":"1 dot"}
We have as many images as we want. Using the Unsplash.it parameters I picked an image width of always 200. You can customize that.
Below is the image downloaded with the above metadata.
1,161 Views