Support Questions

jessica_david · ‎03-08-2018

Hi, everyone,

I am currently working with the ValidateRecord processor in Nifi to test its capabilities & see if it's fit for a task I have. One step I want my flow to have is to be able to validate the format of a CSV file before placing it in HDFS for further processing (using Hive and other methods). The ValidateRecord processor does exactly what I need it to do, except...

What I'm expecting the processor to do is read the CSV data, verify the format & filter out any bad rows, and create a FlowFile with columns in the same order. However, after the ValidateRecord block runs, the columns are rearranged, for reasons that I cannot quite understand. I can get back to the original column ordering by using the ConvertRecord processor, but I was wondering if this is a necessary step in order to get back the original column order or if there's something I'm missing when using the ValidateRecord block?

Potentially relevant information:

Running Nifi Version 1.5.0
Using an AvroSchemaRegistry with CSVReader and CSVRecordSetWriter in the ValidateRecord block
Would prefer to keep the data as raw text as much as possible, as further processes do additional formatting & clean up
Columns seem to be in an arbitrary order when the file leaves the ValidateRecord block (i.e., the column names aren't sorted alphabetically, by the length of the field, etc.)

Thanks!

pvillard · ‎03-09-2018

Hi @Jessica David,

I confirm this is a bug. I created a JIRA for that: https://issues.apache.org/jira/browse/NIFI-4955

I will submit a fix in a minute. Thanks for reporting the issue.

I assume you're using the header as the schema access strategy in the CSV Reader. If you're able to use a different strategy (schema name, or schema text), it should solve the problem even though you need to explicitly define the schema.

View solution in original post

pvillard · ‎03-09-2018

Hi @Jessica David,

I confirm this is a bug. I created a JIRA for that: https://issues.apache.org/jira/browse/NIFI-4955

I will submit a fix in a minute. Thanks for reporting the issue.

I assume you're using the header as the schema access strategy in the CSV Reader. If you're able to use a different strategy (schema name, or schema text), it should solve the problem even though you need to explicitly define the schema.

jessica_david · ‎03-12-2018

Thanks, Pierre! Glad to help, and I'm especially grateful for the quick turnaround time.

I will switch to the explicit schema definition for now while we still have only a few files (and subsequently schemas) to validate. Ideally in the future we'll be able to use this when we have a large number of schemas coming through.

Cheers!

Cloudera Community

Support Questions

ValidateRecord doesn't maintain column order?

NiFi, ValidateRecord

ZKFC maintains session for namenode in zookeeper

How can we change the column order in Hive table w...

Spark Dataframes: How can I change the order of co...

Does the service stop if a disk io error occurs du...

Hive on Tez: How to order an array column?

NiFi application design : maintaining state

Order by Operator in Pig

How to change column Type in SparkSQL?

Join Order from explain plan