Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Master Guru

Using the GetHTTP Processor we grab random images from the DigitalOcean's Unsplash.it free image site. I give it a random file name so we can save it uniquely in HDFS.

5963-gethttp.png

The Entire Data Flow from GetHTTP to Final HDFS storage of image and it's metadata as JSON.

5965-unsplash1.png

ExtractMediaMetaData Processor

5967-extramediametadata.png

The final results:

hdfs dfs -cat /mediametadata/random1469112881039.json

{"Number of Components":"3","Resolution Units":"none","Image Height":"200
pixels","File Name":"apache-tika-3181704319795384377.tmp",
"Data Precision":"8 bits",
"File Modified Date":"Thu Jul 21 14:54:43 UTC 2016","tiff:BitsPerSample":"8",
"Compression Type":"Progressive,Huffman","X-Parsed-By":"org.apache.tika.parser.DefaultParser,
org.apache.tika.parser.jpeg.JpegParser",
"Component 1":"Y component: Quantization table 0, Sampling factors 2 horiz/2vert",
"Component 2":"Cb component: Quantization table 1,Sampling factors 1 horiz/1 vert",
"tiff:ImageLength":"200","mime.type":"image/jpeg","gethttp.remote.source":"unsplash.it",
"Component3":"Cr component: Quantization table 1, Sampling factors 1 horiz/1vert",
"X Resolution":"1 dot",
"FileSize":"4701 bytes","tiff:ImageWidth":"200","path":"./",
"filename":"random1469112881039.jpg","ImageWidth":"200 pixels",
"uuid":"8b7c4f9f-9436-4ccb-b06e-9a720c91f6e0",
"Content-Type":"image/jpeg",
"YResolution":"1 dot"}

We have as many images as we want. Using the Unsplash.it parameters I picked an image width of always 200. You can customize that.

Below is the image downloaded with the above metadata.

5964-random1469112881039.jpg

1,107 Views