I'm using Tika to extract from pdf. The result is one line:
text text text 15 2,659,503,240 2,850,012,215text text text 16 2,526,218,161 632,070,000
How to use the "extracttext" or "replacetext", so the end result will be as follow:
text text text|15|2,659,503,240|2,850,012,215
text text text|16|2,526,218,161|632,070,000
So that I'm able to load to hdfs and Hive. Thanks.