Support Questions
Find answers, ask questions, and share your expertise

Searching Word, Excel, PDF, PowerPoint, Text, and Images documents in Hadoop

Searching Word, Excel, PDF, PowerPoint, Text, and Images documents in Hadoop

We have 3 million documents; Word, Excel, PDF, PowerPoint, Text, and Images. We need to search these documents for key words and return the sentences and file metadata when the keywords are discovered. We will be using SparkR to query the data and apply some analytics on the results. What would be the best approach to accomplishing the document search.