Created 06-09-2016 08:14 AM
I'm using the NiFi ConvertCSVToAvro processor, with an InferAvroSchema processor upstream to obtain the schema. I'm getting an error Failed to Convert 1031/6058 records so most of the records are being converted successfully but I'm unable to figure out why the remainder rows failed. How do I go about debugging / identifying reasons for the failed rows and are there some typical reasons for the failure?
Created 06-09-2016 09:46 AM
How are you defining your Avro Schema?
Typically the 'failed to convert' errors occur when the csv records don't fit the data types defined in your avro schema. If you're using the 'InferAvroSchema' processor or Kite SDK to define the schema, it is possible that the inferred schema isn't a true representation of your data (keep in mind that these methods infer the schema based on a subset of the data, so if your data isn't very consistent then it is likely that they will misinterpret what the field types are and hit errors during converting).
If you know the data, you could get around this by manually defining the Avro schema based on the actual data types.
Created 06-09-2016 09:35 AM
Hi @KC
Any information in the application logs (./logs/nifi-app.log) ?
Created 06-09-2016 09:49 AM
@Pierre Villard nothing other than 2016-06-09 05:50:43,503 WARN [Timer-Driven Process Thread-7] o.a.n.processors.kite.ConvertCSVToAvro ConvertCSVToAvro[id=2238d74a-0635-401d-b51c-45ca87a4cfb9] Failed to convert 1055/6031 records from CSV to Avro
Created 06-09-2016 09:46 AM
How are you defining your Avro Schema?
Typically the 'failed to convert' errors occur when the csv records don't fit the data types defined in your avro schema. If you're using the 'InferAvroSchema' processor or Kite SDK to define the schema, it is possible that the inferred schema isn't a true representation of your data (keep in mind that these methods infer the schema based on a subset of the data, so if your data isn't very consistent then it is likely that they will misinterpret what the field types are and hit errors during converting).
If you know the data, you could get around this by manually defining the Avro schema based on the actual data types.
Created 06-09-2016 09:52 AM
@Laurence Da Luz Yes I am using the 'InferAvroSchema' Processor. Is it possible to output and view the Schema somewhere? I'll also try to manually define the schema
Created 06-09-2016 10:03 AM
@Laurence Da Luz Manually defining the schema work, thanks for the suggestion. Am curious though what went wrong with the inference processor
Created 06-09-2016 10:12 AM
Your 'InferAvroSchema' is likely capturing the schema as an attribute called 'inferred.avro.schema' (assuming you followed the tutorial here: https://community.hortonworks.com/articles/28341/converting-csv-to-avro-with-apache-nifi.html ) If that's the case, you can view its output by looking at one of the flowfiles in queue after 'InferAvroSchema' (List queue > select a flowfile > view attributes > view inferred.avro.schema property). If you want to manually define the schema without changing too much of your flow, you can directly replace your 'InferAvroSchema' processor with an 'UpdateAttribute' processor - within the 'UpdateAttribute' define a new property called inferred.avro.schema and paste in your avro schema as the value (json format).
Created 06-10-2016 02:49 AM
@Laurence Da Luz Thanks. I checked the inferred schema and verified the issue / error. It inferred a column as 'long' when there were 'floats' for some rows.
Created 02-23-2017 02:53 PM
@KC - Did you crack this. Actually I am quite new to this and I am also getting the same issue. Also I tried with new property 'inferred.avro.schema' but did not get any success.