Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Guru

I’m constantly amazed by what powerful things I can do with Apache NiFi in such few steps. I often challenge myself by saying “self, I bet you couldn’t do X with NiFi”. My confidence was challenged yesterday on a long flight back from Peru to Atlanta when I realized I couldn’t perform OCR type tasks with NiFi as it stands today. Perturbed by this fact I set out to come up with a solution. Ultimately this lead me to create a NiFi Tesseract processor for performing OCR tasks natively from within Apache NiFi. It wasn’t really until I was finished that I realized the how useful this processor could be. The Apache Tesseract Processor would give me the ability to read anything from hand written doctors notes from healthcare systems to interpreting scanned children’s book images.

In fact I chose to demonstrate the later by showing how to use Apache NiFi to perform OCR on an excerpt from Dr. Seuss's - "Cat in the Hat” and then feeding that resulting text from the NiFi Tesseract processor to the Mac OS X “say” command to read the output. I have included a screen recording session that shows the Apache NiFi reading in a page from Cat in the Hat and then reading the results.

Screen Recording - Using Apache NiFi to read children's books

Only 5 simple drag and drop processors for a computer to read a child’s book! Thanks Apache NiFi!

2,599 Views
Comments
Super Guru

do you have the code available?

New Contributor

This would be really cool if you provided instructions on how you setup the processors

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎04-19-2016 11:41 PM
Updated by:
 
Contributors
Top Kudoed Authors