Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Parsing a text file and importing content into ElasticSearch

Parsing a text file and importing content into ElasticSearch

New Contributor

Hi all,

I have a text file which I need to parse the records into individual flowfile to be imported into ElasticSearch using PutElasticSearch. Below is a sample of the text file. The actual text file has around 90 fields and 1700 records.

*
# 90 MARKET REPORT                     M33F                           20161208
#  1 Start Date		               start_date                     D   8  0
#  2 Product Name		       product_name                   S  25  0
#  3 Product Code              	       product_code 	              N   8  0
#  4 Vendor Code	               vendor_code	              N   5  0
#  5 Customer Code                     customer_code	              N   7  0
#  6 Mash                              mash                           S   7  0
#  7 Revert                            revert                         S  12  0
#  8 Stat                              stat                           S  12  0
#  9 Region Symbol       	       region_symbol                  S   2  0
# 10 Expired Flag                      expired_flag                   N   1  0
# 11 Disposed Flag                     disposed_flag                  N   1  0
# 12 Sector                            sector                         N   2  0
# 13 Sector Follow                     sector_fol	              N   2  0
# 14 Business                          business                       N   4  0
# 15 Business Follow                   business_follow	              N   4  0
# 16 Industry                          industry                       N   6  0
*
# 1         2                          3         4      5        6        7             8             9   10 11 12  13  14    15    16    
SSL>>>>>>>SSL>>>>>>>>>>>>>>>>>>>>>>>>SSV>>>>>>>SSV>>>>SSV>>>>>>SSL>>>>>>SSL>>>>>>>>>>>SSL>>>>>>>>>>>SSL>SSVSSVSSV>SSV>SSV>>>SSV>>>SSV>>>>>
| 20180308| APPLE                    |    54252| 24221| 2548752| B2D35D8|             | WD0000224223| WD| 1| 0| 50| 60| 4122| 4122| 212005
| 20180308| ORANGE                   |    11245|  2432| 9678523| F8452F3|             | WD0000212423| WD| 0| 1| 50| 70| 4122| 4122| 457842
| 20180308| SOUR PLUM                |     2542| 54621| 1231242| D21D2W2|             | WD0000421245| WD| 0| 1| 40| 80| 1112| 1112| 168442
| 20180308| CITRUS GRAPEFRUIT A      |    98546| 78546| 1245678| T21HJ23|             | AB0000777423| AB| 0| 1| 80| 90| 4122| 4122| 432542
| 20180308| BANANA C                 |     1124| 45123| 2784562| 42422D3|             | AF0000875421| AF| 1| 0| 66| 44| 2125| 5212| 515235
#EOD
*

As you can see, the text file comes with a header explaining the fields, followed by the individual records separated by a new line. What is the best way to split the records into individual flowfile, matching the values to the field names so that they can be imported into ElasticSearch?

The best i got was to use SplitText and put each record into a flowfile, but the header still remains in each flowfile.

Appreciate any advice and suggestions. Thanks

2 REPLIES 2

Re: Parsing a text file and importing content into ElasticSearch

@Kok Ching Hoo

If your file is a csv the best thing is to use PutElasticsearchHttpRecord https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-elasticsearch-nar/1.5.0/org.a...

This is a bulk operation so no need to split the file

Highlighted

Re: Parsing a text file and importing content into ElasticSearch

New Contributor

Thank for your reply.

I understand that CSVReader will only take the first line as header, but for my sample text file, the header is around 19 lines. Is there a way to convert the header into 1 line of field names so that i can be read by CSVReader? Something like this?

|start_date|product_name|product_code|vendor_code|customer_code|mash|revert|stat|region_symbol|expired_flag|disposed_flag|sector|sector_fol|business|business_follow|industry
Don't have an account?
Coming from Hortonworks? Activate your account here