Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

ValidateRecord doesn't maintain column order?

avatar
New Contributor

Hi, everyone,

I am currently working with the ValidateRecord processor in Nifi to test its capabilities & see if it's fit for a task I have. One step I want my flow to have is to be able to validate the format of a CSV file before placing it in HDFS for further processing (using Hive and other methods). The ValidateRecord processor does exactly what I need it to do, except...

What I'm expecting the processor to do is read the CSV data, verify the format & filter out any bad rows, and create a FlowFile with columns in the same order. However, after the ValidateRecord block runs, the columns are rearranged, for reasons that I cannot quite understand. I can get back to the original column ordering by using the ConvertRecord processor, but I was wondering if this is a necessary step in order to get back the original column order or if there's something I'm missing when using the ValidateRecord block?

Potentially relevant information:

  • Running Nifi Version 1.5.0
  • Using an AvroSchemaRegistry with CSVReader and CSVRecordSetWriter in the ValidateRecord block
  • Would prefer to keep the data as raw text as much as possible, as further processes do additional formatting & clean up
  • Columns seem to be in an arbitrary order when the file leaves the ValidateRecord block (i.e., the column names aren't sorted alphabetically, by the length of the field, etc.)

Thanks!

1 ACCEPTED SOLUTION

avatar

Hi @Jessica David,

I confirm this is a bug. I created a JIRA for that: https://issues.apache.org/jira/browse/NIFI-4955

I will submit a fix in a minute. Thanks for reporting the issue.

I assume you're using the header as the schema access strategy in the CSV Reader. If you're able to use a different strategy (schema name, or schema text), it should solve the problem even though you need to explicitly define the schema.

View solution in original post

2 REPLIES 2

avatar

Hi @Jessica David,

I confirm this is a bug. I created a JIRA for that: https://issues.apache.org/jira/browse/NIFI-4955

I will submit a fix in a minute. Thanks for reporting the issue.

I assume you're using the header as the schema access strategy in the CSV Reader. If you're able to use a different strategy (schema name, or schema text), it should solve the problem even though you need to explicitly define the schema.

avatar
New Contributor

Thanks, Pierre! Glad to help, and I'm especially grateful for the quick turnaround time.

I will switch to the explicit schema definition for now while we still have only a few files (and subsequently schemas) to validate. Ideally in the future we'll be able to use this when we have a large number of schemas coming through.

Cheers!