Created 01-12-2018 02:31 AM
Hi,
I am getting error while parsing CSV formatted JSON file in NiFi. My file file column like...
Name : Surendra
Age : 24
Address : {"city":"Chennai","state":"TN","zipcode":"600345"}
Now output should be like this..
Name : Surendra
Age : 24
Address_city : Chennai
Address_state : TN
Address_zipcode : 600345
Pls can anyone help me regarding the same.
Created on 01-13-2018 04:59 AM - edited 08-18-2019 12:51 AM
We can do this parsing inside NiFi by using
Example:-
Let's consider your csv file having n number of rows in it
Surendra,24,"{"city":"Chennai","state":"TN","zipcode":"600345"}" Surendra,25,"{"city":"Chennai","state":"TN","zipcode":"609345"}"
We need to split this file into individual flowfile having each record in one flowfile for splitting we need to use
SplitText:-
processor with below configs as
Line Split Count
1
So if our input csv having 2 lines in it then split text processor will split the input file having 2 lines into 2 flowfiles having each line in one flowfile.
Once we are having each record in one flowfile then we need to use
ExtractText:-
to extract the content of the flowfile using Extract text processor by adding new properties to the processor as below.
Address_city
"city":"(.*?)"
Address_state
"state":"(.*?)"
Address_zipcode
"zipcode":"(.*?)"
Age
,(.*?),
Name
^(.*?),
So in this processor we are going to extract contents of flowfile and keep them as flowfile attributes by adding matching regex.
To create and test regex click here.
You need to change Maximum Buffer Size value (default is 1MB) based on your flowfile size.
Replace Text Configs:-
In the previous step we have extracted all the contents of flowfile based on the properties in Replace Text processor we are going to create a new csv file with comma delimiter(you can use any delimiter you want), By changing below properties and adding replacement value property as follows.
Configs:-
Search Value
(?s)(^.*$)
Replacement Value
${Name},${Age},${Address_city},${Address_state},${Address_zipcode}
Maximum Buffer Size
1 MB
Replacement Strategy
Always Replace
Evaluation Mode
Entire text
So the output of the replace text processor would be
Surendra,24,Chennai,TN,24
Surendra,25,Chennai,TN,24
we have created a csv file without json message now but we are going to have 2 csv files(because our input data having 2 lines),if your input file having 1000 lines then we are going to end up with 1000 ourput csv files.
If you don't want to create 2 output files and want them to merge into 1 output file then you need to use
Merge Content Processor:-
With the below configs,
You need to change all the highlighted properties as per your requirements as per my configs shows Max bin age of 1 min so processor waits for 1 minute before merging all the queued flowfiles and merges them into 1 file.
Delimiter strategy to Text(default is filename) because we need to have our contents of individual flowfile needs to add as newlines in the merged file, so we need to make use of Demarcator property as Shift+Enter(this property helps to add new contents to the newline).
Output:-
1 file having both records in it
Surendra,24,Chennai,TN,600345 Surendra,25,Chennai,TN,609345
I highly sugges you to refer below links to get familiar with all properties in merge content processor
.
I'm attaching the xml to the post you can save the xml and import to nifi and make changes to that accordingly.parse-file-nifi-159780.xml
.
If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of errors.
Created on 01-12-2018 04:37 AM - edited 08-18-2019 12:51 AM
Hi @Shu my input is like this, so now i want to parse these data according to above which i mentioned.
Thanks !
Created 01-12-2018 04:46 AM
I want to fetch this data from Mysql so i created a table name as input in Mysql.
And my flow like this ExecuteSQL ->> SplitAvro ->> ConvertAvroToJson ->> EvaluateJsonPath ->> UpdateAttribute
Created 01-13-2018 03:42 AM
Thanks @shu for your reply ... and I am looking for the same output which you send me. I want the output in CSV file like Surendra,24,Chennai,TN,24. And output will stored in local machine only.
Created on 01-13-2018 04:59 AM - edited 08-18-2019 12:51 AM
We can do this parsing inside NiFi by using
Example:-
Let's consider your csv file having n number of rows in it
Surendra,24,"{"city":"Chennai","state":"TN","zipcode":"600345"}" Surendra,25,"{"city":"Chennai","state":"TN","zipcode":"609345"}"
We need to split this file into individual flowfile having each record in one flowfile for splitting we need to use
SplitText:-
processor with below configs as
Line Split Count
1
So if our input csv having 2 lines in it then split text processor will split the input file having 2 lines into 2 flowfiles having each line in one flowfile.
Once we are having each record in one flowfile then we need to use
ExtractText:-
to extract the content of the flowfile using Extract text processor by adding new properties to the processor as below.
Address_city
"city":"(.*?)"
Address_state
"state":"(.*?)"
Address_zipcode
"zipcode":"(.*?)"
Age
,(.*?),
Name
^(.*?),
So in this processor we are going to extract contents of flowfile and keep them as flowfile attributes by adding matching regex.
To create and test regex click here.
You need to change Maximum Buffer Size value (default is 1MB) based on your flowfile size.
Replace Text Configs:-
In the previous step we have extracted all the contents of flowfile based on the properties in Replace Text processor we are going to create a new csv file with comma delimiter(you can use any delimiter you want), By changing below properties and adding replacement value property as follows.
Configs:-
Search Value
(?s)(^.*$)
Replacement Value
${Name},${Age},${Address_city},${Address_state},${Address_zipcode}
Maximum Buffer Size
1 MB
Replacement Strategy
Always Replace
Evaluation Mode
Entire text
So the output of the replace text processor would be
Surendra,24,Chennai,TN,24
Surendra,25,Chennai,TN,24
we have created a csv file without json message now but we are going to have 2 csv files(because our input data having 2 lines),if your input file having 1000 lines then we are going to end up with 1000 ourput csv files.
If you don't want to create 2 output files and want them to merge into 1 output file then you need to use
Merge Content Processor:-
With the below configs,
You need to change all the highlighted properties as per your requirements as per my configs shows Max bin age of 1 min so processor waits for 1 minute before merging all the queued flowfiles and merges them into 1 file.
Delimiter strategy to Text(default is filename) because we need to have our contents of individual flowfile needs to add as newlines in the merged file, so we need to make use of Demarcator property as Shift+Enter(this property helps to add new contents to the newline).
Output:-
1 file having both records in it
Surendra,24,Chennai,TN,600345 Surendra,25,Chennai,TN,609345
I highly sugges you to refer below links to get familiar with all properties in merge content processor
.
I'm attaching the xml to the post you can save the xml and import to nifi and make changes to that accordingly.parse-file-nifi-159780.xml
.
If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of errors.
Created 01-13-2018 05:03 AM
Thanks for your overwhelming response, this will help me a great.