Support Questions
Find answers, ask questions, and share your expertise

Extract PDF with Tika

New Contributor


I'm using Tika to extract from pdf. The result is one line:

text text text 15 2,659,503,240 2,850,012,215text text text 16 2,526,218,161 632,070,000

How to use the "extracttext" or "replacetext", so the end result will be as follow:

text text text|15|2,659,503,240|2,850,012,215
text text text|16|2,526,218,161|632,070,000

So that I'm able to load to hdfs and Hive. Thanks.