Support Questions

Find answers, ask questions, and share your expertise

NiFi, ValidateRecord

avatar
Rising Star

I'm trying to gain experience with Records, specifically ValidateRecord.

All FlowFiles out of ValidateRecord seem to be converted to the format set by the Record Writer property. Is there any way of not having the data parsed on output / maintaining the original input? I have a case with CSV input as example, where I'd like to log and report invalid lines as-is. Could be for passing back to data supplier, where I'd rather show original input, than something transformed by the Writer.

I'd appreciate inspiration also to when you'd want the schema to validate against not be the one used by the Record Reader.

Using NiFi 1.5.0.3.

1 ACCEPTED SOLUTION

avatar
Master Guru
@Henrik Olsen

The same exact case is introduced in NiFi-1.6 version jira addressing this bug NiFi-4883.

Starting from NiFi-1.6 we are able to use one record writer for invalid records and use different record writer for the valid records.

View solution in original post

3 REPLIES 3

avatar
Master Guru
@Henrik Olsen

The same exact case is introduced in NiFi-1.6 version jira addressing this bug NiFi-4883.

Starting from NiFi-1.6 we are able to use one record writer for invalid records and use different record writer for the valid records.

avatar
Rising Star

Follow-up regarding Records and schemas. I use InferAvroSchema. It seems to miss the possibility of null in some of my CSV data. I've set it, as test, to analyse 1.000.000 records to ensure it sees all, but no luck. On some columns it adds possible null to field type, on others not. Is there a built in (invisible) upper limit to have many records are analysed? And could it be considered to add an option in the processor to always allow null values?

avatar
Master Guru

@Henrik Olsen

Based on Number Of Records To Analyze property value NiFi will analyze those many records (or) based on each flowfile number of records to determine type for the record.

If we keep 1million records to analyze if you are having one flowfile with 1 million records then only the value will be considered (or) processor will limit through number of records in the flowfile.

.

i think there are no null values for the columns that's why NiFi inferavroschema processor not able to add null as default type for some columns(in case of empty spaces they are not treated as null values for the string type).