Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Extract PDF with Tika

Extract PDF with Tika

New Contributor


I'm using Tika to extract from pdf. The result is one line:

text text text 15 2,659,503,240 2,850,012,215text text text 16 2,526,218,161 632,070,000

How to use the "extracttext" or "replacetext", so the end result will be as follow:

text text text|15|2,659,503,240|2,850,012,215
text text text|16|2,526,218,161|632,070,000

So that I'm able to load to hdfs and Hive. Thanks.

Don't have an account?
Coming from Hortonworks? Activate your account here