Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Remove first few lines in a text/csv flowfile content in Apache NiFi

avatar
Explorer

I receive a text/csv file with many lines through an InvokeHttp Processor. My requirement is that I don't want the first 7 lines. What should I do to remove the first 7 lines and keep the remaining the same text/csv format.

4 REPLIES 4

avatar

Hi @glad1 ,

Can you elaborate more on the data that you want to remove? For example if the data is part of the CSV and it has unique value in one or more columns, then you can use QueryRecord processor where the query exclude records with this unique value. If the data is out of the CSV - like a header information - then depending how this data look like and if its surrounded with some special characters then you can use ReplaceText Processor with regex that would isolate those lines and then replace them with empty space and so on. If you can provide some sample data it would help in figuring out the best solution for this scenario.

Thanks

avatar
Explorer

glad1_0-1700718465953.png

^ I've attached the image above. this is how the data looks. I want to clean the first 7 rows and let the 8th row (header row) be first.

 

avatar
Super Collaborator

I you're confident the data returned is consistent and always more than 7 lines...then a quick and dirty would be a Groovy script like this.

import java.nio.charset.StandardCharsets

FlowFile flowFile = session.get()
if(!flowFile) return

flowFile = session.write(flowFile, {inputStream, outputStream ->
                                      String[] data = inputStream.readLines()
                                      data = data.drop(7)
                                      outputStream.write(data.join("\n").getBytes(StandardCharsets.UTF_8))
                                  } as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)

avatar
Explorer

Thank you, this worked!