Created 08-11-2016 01:24 PM
I have pdf file. I have copied the file from local system to hdfs. But i need to convert the pdf file into a hive table. Is there anyway to do in hive? I know we can handle the same in pig.
Created 08-11-2016 01:52 PM
If i understand ur question and i am making a assumption here, you want to store the pdf in hive to perform some analysis on the content inside the pdf file. The way i have done this is with apache Tika. The process will extract text file pdf and store in text format file. Then apply data model to text and store in hive.
Created 08-11-2016 01:52 PM
If i understand ur question and i am making a assumption here, you want to store the pdf in hive to perform some analysis on the content inside the pdf file. The way i have done this is with apache Tika. The process will extract text file pdf and store in text format file. Then apply data model to text and store in hive.
Created 02-15-2017 04:51 PM
How would you go about the install procedure for Apache Tika? Because I have the same situation.