Created on 05-11-2018 10:39 AM - edited 09-16-2022 06:12 AM
I am working to clean CSV files and stuck due to additional coma's in headers. Can any body guide me how to replace header line with customer header.
Created on 05-11-2018 10:53 AM - edited 08-18-2019 12:39 AM
You can use Replace text processor with Replacement Strategy as prepend and keep your custom header in Replacement value property value. if your file size more than 1 MB then you need to change the Maximum Buffer Size according to your flow file size.
With this configs each flowfile will have custom header line added as first line of the flowfile and all the content will be added from second line.
-
If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
Created on 05-11-2018 10:53 AM - edited 08-18-2019 12:39 AM
You can use Replace text processor with Replacement Strategy as prepend and keep your custom header in Replacement value property value. if your file size more than 1 MB then you need to change the Maximum Buffer Size according to your flow file size.
With this configs each flowfile will have custom header line added as first line of the flowfile and all the content will be added from second line.
-
If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
Created 05-11-2018 12:13 PM
By using record oriented processors you can ignore the header in the CSV file and define your own Schema to read the incoming file.
Example:-
Use ConvertRecord processor with
Record Reader as CsvReader and in the csv reader controller service keep the below property values as
Treat First Line as Header
true
Ignore CSV Header Column Names
true
With this property values we are treating the first line as header and ignoring the header column names.
Define your Avro schema for the incoming CSV file, by using this setting you are able to parse the incoming file.
In Record Reader as CSVSetWriter and refer the same avro schema registry for writer also(if you need all the columns needed in the output flowfile).
Keep the below property to true
Include Header Line
true
Each Csv file will get new header matching with the avro schema.
Refer to this link how to configure Convert Record processor and there are bunch of other articles regarding configurations of record oriented processors.
Created 05-14-2018 11:23 AM
Dear @Shu it worked perfectly, concept was simple but great, thank you !!