Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NiFi: ConvertCSVToAvro processor is unable to find Records Schema

avatar
Rising Star

Hi,

We are loading CSV file (1 rows for testing purpose) into Hive using NIFI.

Processor: ConvertCSVToAvro is unable to find Records Schema generated by Processor: InferAvroSchema.

InferAvroSchema --> CSV Header Definition is manually entered.

Is there any other way, we ignore Avro and directly load CSV or any other format of source files into Hive Table?

-------

Any yes, we have also tried to ignore Processor: InferAvroSchema and use Processor: ConvertCSVToAvro by creating xyz.avsc file manually, but then we are with error: unable to find xyz.avsc file.

inferavroschema.jpeg

convertcsvtoavro.jpeg

Looking forward.

1 ACCEPTED SOLUTION

avatar
Master Guru

InferAvroSchema should have Schema Output Destination set to "flowfile-attribute", the outgoing flow file should contain the CSV data and an attribute called "inferred.avro.schema" which contains the schema to use. Then in ConvertCSVToAvro you can set the Record Schema property to "${inferred.avro.schema}" which will cause it to use the inferred schema for conversion.

Since you are entering the CSV Header Definition manually, you may find it more helpful to create an Avro schema manually and use ConvertRecord instead of InferAvroSchema -> ConvertCSVToAvro. If you don't know the datatypes of the columns and are thus relying on InferAvroSchema to do that for you, you could still use ConvertRecord instead of ConvertCSVToAvro.

View solution in original post

3 REPLIES 3

avatar
Master Guru

InferAvroSchema should have Schema Output Destination set to "flowfile-attribute", the outgoing flow file should contain the CSV data and an attribute called "inferred.avro.schema" which contains the schema to use. Then in ConvertCSVToAvro you can set the Record Schema property to "${inferred.avro.schema}" which will cause it to use the inferred schema for conversion.

Since you are entering the CSV Header Definition manually, you may find it more helpful to create an Avro schema manually and use ConvertRecord instead of InferAvroSchema -> ConvertCSVToAvro. If you don't know the datatypes of the columns and are thus relying on InferAvroSchema to do that for you, you could still use ConvertRecord instead of ConvertCSVToAvro.

avatar
Rising Star

Thanks. You response helped, to start thinking in the right direction.

avatar
Rising Star

Found the missing pieces.

Go to Configuration (not configure) and added all Controller Services related to Avro and CSV (if you are loading from source).

Issue resolved.