- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Remove first few lines in a text/csv flowfile content in Apache NiFi
- Labels:
-
Apache NiFi
Created on ‎11-22-2023 01:05 AM - edited ‎11-22-2023 01:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I receive a text/csv file with many lines through an InvokeHttp Processor. My requirement is that I don't want the first 7 lines. What should I do to remove the first 7 lines and keep the remaining the same text/csv format.
Created ‎11-22-2023 09:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @glad1 ,
Can you elaborate more on the data that you want to remove? For example if the data is part of the CSV and it has unique value in one or more columns, then you can use QueryRecord processor where the query exclude records with this unique value. If the data is out of the CSV - like a header information - then depending how this data look like and if its surrounded with some special characters then you can use ReplaceText Processor with regex that would isolate those lines and then replace them with empty space and so on. If you can provide some sample data it would help in figuring out the best solution for this scenario.
Thanks
Created ‎11-22-2023 09:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
^ I've attached the image above. this is how the data looks. I want to clean the first 7 rows and let the 8th row (header row) be first.
Created ‎11-22-2023 11:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I you're confident the data returned is consistent and always more than 7 lines...then a quick and dirty would be a Groovy script like this.
import java.nio.charset.StandardCharsets
FlowFile flowFile = session.get()
if(!flowFile) return
flowFile = session.write(flowFile, {inputStream, outputStream ->
String[] data = inputStream.readLines()
data = data.drop(7)
outputStream.write(data.join("\n").getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
Created ‎11-22-2023 09:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, this worked!
