Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Master Guru

Using the GetHTTP Processor we grab random images from the DigitalOcean's Unsplash.it free image site. I give it a random file name so we can save it uniquely in HDFS.

5963-gethttp.png

The Entire Data Flow from GetHTTP to Final HDFS storage of image and it's metadata as JSON.

5965-unsplash1.png

ExtractMediaMetaData Processor

5967-extramediametadata.png

The final results:

hdfs dfs -cat /mediametadata/random1469112881039.json

{"Number of Components":"3","Resolution Units":"none","Image Height":"200
pixels","File Name":"apache-tika-3181704319795384377.tmp",
"Data Precision":"8 bits",
"File Modified Date":"Thu Jul 21 14:54:43 UTC 2016","tiff:BitsPerSample":"8",
"Compression Type":"Progressive,Huffman","X-Parsed-By":"org.apache.tika.parser.DefaultParser,
org.apache.tika.parser.jpeg.JpegParser",
"Component 1":"Y component: Quantization table 0, Sampling factors 2 horiz/2vert",
"Component 2":"Cb component: Quantization table 1,Sampling factors 1 horiz/1 vert",
"tiff:ImageLength":"200","mime.type":"image/jpeg","gethttp.remote.source":"unsplash.it",
"Component3":"Cr component: Quantization table 1, Sampling factors 1 horiz/1vert",
"X Resolution":"1 dot",
"FileSize":"4701 bytes","tiff:ImageWidth":"200","path":"./",
"filename":"random1469112881039.jpg","ImageWidth":"200 pixels",
"uuid":"8b7c4f9f-9436-4ccb-b06e-9a720c91f6e0",
"Content-Type":"image/jpeg",
"YResolution":"1 dot"}

We have as many images as we want. Using the Unsplash.it parameters I picked an image width of always 200. You can customize that.

Below is the image downloaded with the above metadata.

5964-random1469112881039.jpg

957 Views