- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
NiFi: ConvertCSVToAvro processor is unable to find Records Schema
- Labels:
-
Apache NiFi
Created ‎05-29-2018 04:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are loading CSV file (1 rows for testing purpose) into Hive using NIFI.
Processor: ConvertCSVToAvro is unable to find Records Schema generated by Processor: InferAvroSchema.
InferAvroSchema --> CSV Header Definition is manually entered.
Is there any other way, we ignore Avro and directly load CSV or any other format of source files into Hive Table?
-------
Any yes, we have also tried to ignore Processor: InferAvroSchema and use Processor: ConvertCSVToAvro by creating xyz.avsc file manually, but then we are with error: unable to find xyz.avsc file.
Looking forward.
Created ‎05-29-2018 05:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
InferAvroSchema should have Schema Output Destination set to "flowfile-attribute", the outgoing flow file should contain the CSV data and an attribute called "inferred.avro.schema" which contains the schema to use. Then in ConvertCSVToAvro you can set the Record Schema property to "${inferred.avro.schema}" which will cause it to use the inferred schema for conversion.
Since you are entering the CSV Header Definition manually, you may find it more helpful to create an Avro schema manually and use ConvertRecord instead of InferAvroSchema -> ConvertCSVToAvro. If you don't know the datatypes of the columns and are thus relying on InferAvroSchema to do that for you, you could still use ConvertRecord instead of ConvertCSVToAvro.
Created ‎05-29-2018 05:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
InferAvroSchema should have Schema Output Destination set to "flowfile-attribute", the outgoing flow file should contain the CSV data and an attribute called "inferred.avro.schema" which contains the schema to use. Then in ConvertCSVToAvro you can set the Record Schema property to "${inferred.avro.schema}" which will cause it to use the inferred schema for conversion.
Since you are entering the CSV Header Definition manually, you may find it more helpful to create an Avro schema manually and use ConvertRecord instead of InferAvroSchema -> ConvertCSVToAvro. If you don't know the datatypes of the columns and are thus relying on InferAvroSchema to do that for you, you could still use ConvertRecord instead of ConvertCSVToAvro.
Created ‎05-29-2018 09:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. You response helped, to start thinking in the right direction.
Created ‎05-29-2018 09:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Found the missing pieces.
Go to Configuration (not configure) and added all Controller Services related to Avro and CSV (if you are loading from source).
Issue resolved.
