Member since
06-13-2016
1
Post
0
Kudos Received
0
Solutions
06-13-2016
10:54 AM
1 Kudo
@Christopher Frankland here are some resources I would recommend checking out: https://community.hortonworks.com/articles/136/how-to-search-for-text-in-an-image.html By @Saptak Sen Another fun one using Apache Nifi by @Jeremy Dyer https://community.hortonworks.com/articles/28380/nifi-ocr-using-apache-nifi-to-read-childrens-books.html And here's a great tutorial with its use: http://hortonworks.com/hadoop-tutorial/indexing-and-searching-text-within-images-with-apache-solr/ Here's also a blog post that is older, so please check commands, but it has some important lessons when it comes to accuracy, the quality and resolution of the PDF will greatly affect your results. http://kiirani.com/2013/03/22/tesseract-pdf.html Posting this as I think it's interesting as well to examine other effects on the source data that might affect accuracy: http://www.assistivetechnology.vcu.edu/wp-content/uploads/sites/1864/2013/09/pxc3882784.pdf
... View more