Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
Super Guru

Using the GetHTTP Processor we grab random images from the DigitalOcean's free image site. I give it a random file name so we can save it uniquely in HDFS.


The Entire Data Flow from GetHTTP to Final HDFS storage of image and it's metadata as JSON.


ExtractMediaMetaData Processor


The final results:

hdfs dfs -cat /mediametadata/random1469112881039.json

{"Number of Components":"3","Resolution Units":"none","Image Height":"200
pixels","File Name":"apache-tika-3181704319795384377.tmp",
"Data Precision":"8 bits",
"File Modified Date":"Thu Jul 21 14:54:43 UTC 2016","tiff:BitsPerSample":"8",
"Compression Type":"Progressive,Huffman","X-Parsed-By":"org.apache.tika.parser.DefaultParser,
"Component 1":"Y component: Quantization table 0, Sampling factors 2 horiz/2vert",
"Component 2":"Cb component: Quantization table 1,Sampling factors 1 horiz/1 vert",
"Component3":"Cr component: Quantization table 1, Sampling factors 1 horiz/1vert",
"X Resolution":"1 dot",
"FileSize":"4701 bytes","tiff:ImageWidth":"200","path":"./",
"filename":"random1469112881039.jpg","ImageWidth":"200 pixels",
"YResolution":"1 dot"}

We have as many images as we want. Using the parameters I picked an image width of always 200. You can customize that.

Below is the image downloaded with the above metadata.


Tags (1)
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎09-16-2022 01:35 AM
Updated by:
Top Kudoed Authors