Support Questions
Find answers, ask questions, and share your expertise

ConvertCSVToAvro NiFi Failed to Convert Records

Contributor

I'm using the NiFi ConvertCSVToAvro processor, with an InferAvroSchema processor upstream to obtain the schema. I'm getting an error Failed to Convert 1031/6058 records so most of the records are being converted successfully but I'm unable to figure out why the remainder rows failed. How do I go about debugging / identifying reasons for the failed rows and are there some typical reasons for the failure?

1 ACCEPTED SOLUTION

Accepted Solutions

Rising Star

@KC

How are you defining your Avro Schema?

Typically the 'failed to convert' errors occur when the csv records don't fit the data types defined in your avro schema. If you're using the 'InferAvroSchema' processor or Kite SDK to define the schema, it is possible that the inferred schema isn't a true representation of your data (keep in mind that these methods infer the schema based on a subset of the data, so if your data isn't very consistent then it is likely that they will misinterpret what the field types are and hit errors during converting).

If you know the data, you could get around this by manually defining the Avro schema based on the actual data types.

View solution in original post

8 REPLIES 8

Hi @KC

Any information in the application logs (./logs/nifi-app.log) ?

Contributor

@Pierre Villard nothing other than 2016-06-09 05:50:43,503 WARN [Timer-Driven Process Thread-7] o.a.n.processors.kite.ConvertCSVToAvro ConvertCSVToAvro[id=2238d74a-0635-401d-b51c-45ca87a4cfb9] Failed to convert 1055/6031 records from CSV to Avro

Rising Star

@KC

How are you defining your Avro Schema?

Typically the 'failed to convert' errors occur when the csv records don't fit the data types defined in your avro schema. If you're using the 'InferAvroSchema' processor or Kite SDK to define the schema, it is possible that the inferred schema isn't a true representation of your data (keep in mind that these methods infer the schema based on a subset of the data, so if your data isn't very consistent then it is likely that they will misinterpret what the field types are and hit errors during converting).

If you know the data, you could get around this by manually defining the Avro schema based on the actual data types.

View solution in original post

Contributor

@Laurence Da Luz Yes I am using the 'InferAvroSchema' Processor. Is it possible to output and view the Schema somewhere? I'll also try to manually define the schema

Contributor

@Laurence Da Luz Manually defining the schema work, thanks for the suggestion. Am curious though what went wrong with the inference processor

Rising Star

@KC

Your 'InferAvroSchema' is likely capturing the schema as an attribute called 'inferred.avro.schema' (assuming you followed the tutorial here: https://community.hortonworks.com/articles/28341/converting-csv-to-avro-with-apache-nifi.html ) If that's the case, you can view its output by looking at one of the flowfiles in queue after 'InferAvroSchema' (List queue > select a flowfile > view attributes > view inferred.avro.schema property). If you want to manually define the schema without changing too much of your flow, you can directly replace your 'InferAvroSchema' processor with an 'UpdateAttribute' processor - within the 'UpdateAttribute' define a new property called inferred.avro.schema and paste in your avro schema as the value (json format).

Contributor

@Laurence Da Luz Thanks. I checked the inferred schema and verified the issue / error. It inferred a column as 'long' when there were 'floats' for some rows.

New Contributor

@KC - Did you crack this. Actually I am quite new to this and I am also getting the same issue. Also I tried with new property 'inferred.avro.schema' but did not get any success.