Support Questions

Find answers, ask questions, and share your expertise

How to convert pdf file into hive table?

avatar

I have pdf file. I have copied the file from local system to hdfs. But i need to convert the pdf file into a hive table. Is there anyway to do in hive? I know we can handle the same in pig.

1 ACCEPTED SOLUTION

avatar
Master Guru

If i understand ur question and i am making a assumption here, you want to store the pdf in hive to perform some analysis on the content inside the pdf file. The way i have done this is with apache Tika. The process will extract text file pdf and store in text format file. Then apply data model to text and store in hive.

View solution in original post

2 REPLIES 2

avatar
Master Guru

If i understand ur question and i am making a assumption here, you want to store the pdf in hive to perform some analysis on the content inside the pdf file. The way i have done this is with apache Tika. The process will extract text file pdf and store in text format file. Then apply data model to text and store in hive.

avatar

How would you go about the install procedure for Apache Tika? Because I have the same situation.