We have 3 million documents; Word, Excel, PDF, PowerPoint,
Text, and Images. We need to search these documents for key words and return
the sentences and file metadata when the keywords are discovered. We will be
using SparkR to query the data and apply some analytics on the results. What
would be the best approach to accomplishing the document search.