Created 03-08-2018 09:41 PM
Hi, everyone,
I am currently working with the ValidateRecord processor in Nifi to test its capabilities & see if it's fit for a task I have. One step I want my flow to have is to be able to validate the format of a CSV file before placing it in HDFS for further processing (using Hive and other methods). The ValidateRecord processor does exactly what I need it to do, except...
What I'm expecting the processor to do is read the CSV data, verify the format & filter out any bad rows, and create a FlowFile with columns in the same order. However, after the ValidateRecord block runs, the columns are rearranged, for reasons that I cannot quite understand. I can get back to the original column ordering by using the ConvertRecord processor, but I was wondering if this is a necessary step in order to get back the original column order or if there's something I'm missing when using the ValidateRecord block?
Potentially relevant information:
Thanks!
Created 03-09-2018 05:31 PM
Hi @Jessica David,
I confirm this is a bug. I created a JIRA for that: https://issues.apache.org/jira/browse/NIFI-4955
I will submit a fix in a minute. Thanks for reporting the issue.
I assume you're using the header as the schema access strategy in the CSV Reader. If you're able to use a different strategy (schema name, or schema text), it should solve the problem even though you need to explicitly define the schema.
Created 03-09-2018 05:31 PM
Hi @Jessica David,
I confirm this is a bug. I created a JIRA for that: https://issues.apache.org/jira/browse/NIFI-4955
I will submit a fix in a minute. Thanks for reporting the issue.
I assume you're using the header as the schema access strategy in the CSV Reader. If you're able to use a different strategy (schema name, or schema text), it should solve the problem even though you need to explicitly define the schema.
Created 03-12-2018 01:40 PM
Thanks, Pierre! Glad to help, and I'm especially grateful for the quick turnaround time.
I will switch to the explicit schema definition for now while we still have only a few files (and subsequently schemas) to validate. Ideally in the future we'll be able to use this when we have a large number of schemas coming through.
Cheers!