- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Read CSV File with Header and Filter rows and then convert to JSON using Apache Nifi
- Labels:
-
Apache NiFi
Created on 12-21-2020 12:50 AM - edited 12-21-2020 02:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Im new to Apache Nifi and i'm looking on how to filter the CSV data using specific column.
I'm able to convert to JOSN without filtering, but SplitText and RouteonAttribute processors are not helping to filter the data. Below is my input CSV.
Input:
number,name,resourceState,location,manufacturer
111897,lok,INSTALLED,HYD,ABC
115677,redd,RETIRED,BLR,ABC
1108448,eswar,PROP_INITIAL,CLT,ABC
1116740,wqwq,INITIAL,AA,ABC
Filtering should be based on resourceState column to consider INSTALLED and RETIRED data. So, converted JSON should have only 2 rows like below.
Expected JSON Output:
[{
"number": "111897",
"name": "lok",
"resourceState": "INSTALLED",
"location": "HYD",
"manufacturer": "ABC"
},
{
"number": "115677",
"name": "redd",
"resourceState": "RETIRED",
"location": "BLR",
"manufacturer": "ABC"
}]
Please help me on the CSV filtering part.
Thanks in Advance.
Created 12-21-2020 02:44 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Lokeswar
The queryRecord processor does exactly what you want. But you need to have your job in a record-oriented approach, using json reader and json writter. Then you don't work with flow-file attribute, but directly with your flow-file content.
So, you have your CSV:
- use a convertRecord to transform it in record flow-file, using CSVreader as reader, and JSONTreeWriter as out writer as you want JSON
- add a queryRecord processor, and your query should look like this: SELECT * FROM FLOWFILE WHERE resourceState='INSTALLED' OR resourceState='RETIRED'
warning: don't use double quote for values, just simple quote
After this, you will have a condition at the output of the queryrecord that you can plugto your next processor.
Created 12-21-2020 02:44 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Lokeswar
The queryRecord processor does exactly what you want. But you need to have your job in a record-oriented approach, using json reader and json writter. Then you don't work with flow-file attribute, but directly with your flow-file content.
So, you have your CSV:
- use a convertRecord to transform it in record flow-file, using CSVreader as reader, and JSONTreeWriter as out writer as you want JSON
- add a queryRecord processor, and your query should look like this: SELECT * FROM FLOWFILE WHERE resourceState='INSTALLED' OR resourceState='RETIRED'
warning: don't use double quote for values, just simple quote
After this, you will have a condition at the output of the queryrecord that you can plugto your next processor.
Created 12-21-2020 09:44 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Stephane, queryRecord processor works as suggested.