- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to convert pdf file into hive table?
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Pig
Created ‎08-11-2016 01:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have pdf file. I have copied the file from local system to hdfs. But i need to convert the pdf file into a hive table. Is there anyway to do in hive? I know we can handle the same in pig.
Created ‎08-11-2016 01:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If i understand ur question and i am making a assumption here, you want to store the pdf in hive to perform some analysis on the content inside the pdf file. The way i have done this is with apache Tika. The process will extract text file pdf and store in text format file. Then apply data model to text and store in hive.
Created ‎08-11-2016 01:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If i understand ur question and i am making a assumption here, you want to store the pdf in hive to perform some analysis on the content inside the pdf file. The way i have done this is with apache Tika. The process will extract text file pdf and store in text format file. Then apply data model to text and store in hive.
Created ‎02-15-2017 04:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How would you go about the install procedure for Apache Tika? Because I have the same situation.
