- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
ValidateRecord doesn't maintain column order?
- Labels:
-
Apache NiFi
Created ‎03-08-2018 09:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, everyone,
I am currently working with the ValidateRecord processor in Nifi to test its capabilities & see if it's fit for a task I have. One step I want my flow to have is to be able to validate the format of a CSV file before placing it in HDFS for further processing (using Hive and other methods). The ValidateRecord processor does exactly what I need it to do, except...
What I'm expecting the processor to do is read the CSV data, verify the format & filter out any bad rows, and create a FlowFile with columns in the same order. However, after the ValidateRecord block runs, the columns are rearranged, for reasons that I cannot quite understand. I can get back to the original column ordering by using the ConvertRecord processor, but I was wondering if this is a necessary step in order to get back the original column order or if there's something I'm missing when using the ValidateRecord block?
Potentially relevant information:
- Running Nifi Version 1.5.0
- Using an AvroSchemaRegistry with CSVReader and CSVRecordSetWriter in the ValidateRecord block
- Would prefer to keep the data as raw text as much as possible, as further processes do additional formatting & clean up
- Columns seem to be in an arbitrary order when the file leaves the ValidateRecord block (i.e., the column names aren't sorted alphabetically, by the length of the field, etc.)
Thanks!
Created ‎03-09-2018 05:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Jessica David,
I confirm this is a bug. I created a JIRA for that: https://issues.apache.org/jira/browse/NIFI-4955
I will submit a fix in a minute. Thanks for reporting the issue.
I assume you're using the header as the schema access strategy in the CSV Reader. If you're able to use a different strategy (schema name, or schema text), it should solve the problem even though you need to explicitly define the schema.
Created ‎03-09-2018 05:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Jessica David,
I confirm this is a bug. I created a JIRA for that: https://issues.apache.org/jira/browse/NIFI-4955
I will submit a fix in a minute. Thanks for reporting the issue.
I assume you're using the header as the schema access strategy in the CSV Reader. If you're able to use a different strategy (schema name, or schema text), it should solve the problem even though you need to explicitly define the schema.
Created ‎03-12-2018 01:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Pierre! Glad to help, and I'm especially grateful for the quick turnaround time.
I will switch to the explicit schema definition for now while we still have only a few files (and subsequently schemas) to validate. Ideally in the future we'll be able to use this when we have a large number of schemas coming through.
Cheers!
